1
|
Kucuk E, van der Sanden BPGH, O'Gorman L, Kwint M, Derks R, Wenger AM, Lambert C, Chakraborty S, Baybayan P, Rowell WJ, Brunner HG, Vissers LELM, Hoischen A, Gilissen C. Comprehensive de novo mutation discovery with HiFi long-read sequencing. Genome Med 2023; 15:34. [PMID: 37158973 PMCID: PMC10169305 DOI: 10.1186/s13073-023-01183-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 04/19/2023] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.
Collapse
Affiliation(s)
- Erdi Kucuk
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Bart P G H van der Sanden
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Luke O'Gorman
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Michael Kwint
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | | | | | | | | | | | - Han G Brunner
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
- GROW School for Oncology and Developmental Biology, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
- Department of Internal Medicine, Radboud University Medical Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands.
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
| |
Collapse
|
2
|
Huntington's disease age at motor onset is modified by the tandem hexamer repeat in TCERG1. NPJ Genom Med 2022; 7:53. [PMID: 36064847 PMCID: PMC9445028 DOI: 10.1038/s41525-022-00317-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 07/15/2022] [Indexed: 01/29/2023] Open
Abstract
Huntington's disease is caused by an expanded CAG tract in HTT. The length of the CAG tract accounts for over half the variance in age at onset of disease, and is influenced by other genetic factors, mostly implicating the DNA maintenance machinery. We examined a single nucleotide variant, rs79727797, on chromosome 5 in the TCERG1 gene, previously reported to be associated with Huntington's disease and a quasi-tandem repeat (QTR) hexamer in exon 4 of TCERG1 with a central pure repeat. We developed a method for calling perfect and imperfect repeats from exome-sequencing data, and tested association between the QTR in TCERG1 and residual age at motor onset (after correcting for the effects of CAG length in the HTT gene) in 610 individuals with Huntington's disease via regression analysis. We found a significant association between age at onset and the sum of the repeat lengths from both alleles of the QTR (p = 2.1 × 10-9), with each added repeat hexamer reducing age at onset by one year (95% confidence interval [0.7, 1.4]). This association explained that previously observed with rs79727797. The association with age at onset in the genome-wide association study is due to a QTR hexamer in TCERG1, translated to a glutamine/alanine tract in the protein. We could not distinguish whether this was due to cis-effects of the hexamer repeat on gene expression or of the encoded glutamine/alanine tract in the protein. These results motivate further study of the mechanisms by which TCERG1 modifies onset of HD.
Collapse
|
3
|
Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep 2022; 12:13124. [PMID: 35907931 PMCID: PMC9338934 DOI: 10.1038/s41598-022-17267-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.
Collapse
|
4
|
Rajan-Babu IS, Peng JJ, Chiu R, Li C, Mohajeri A, Dolzhenko E, Eberle MA, Birol I, Friedman JM. Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions. Genome Med 2021; 13:126. [PMID: 34372915 PMCID: PMC8351082 DOI: 10.1186/s13073-021-00932-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 07/05/2021] [Indexed: 02/01/2023] Open
Abstract
Background Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. Methods We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. Results We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. Conclusions We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00932-9.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada. .,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK.
| | - Junran J Peng
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| | - Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada
| | | | | | - Chenkai Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V6T1Z4, Canada
| | - Arezoo Mohajeri
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| | | | | | - Inanc Birol
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada.,Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| |
Collapse
|
5
|
Midha MK, Wu M, Chiu KP. Long-read sequencing in deciphering human genetics to a greater depth. Hum Genet 2019; 138:1201-1215. [PMID: 31538236 DOI: 10.1007/s00439-019-02064-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022]
Abstract
Through four decades' development, DNA sequencing has inched into the era of single-molecule sequencing (SMS), or the third-generation sequencing (TGS), as represented by two distinct technical approaches developed independently by Pacific Bioscience (PacBio) and Oxford Nanopore Technologies (ONT). Historically, each generation of sequencing technologies was marked by innovative technological achievements and novel applications. Long reads (LRs) are considered as the most advantageous feature of SMS shared by both PacBio and ONT to distinguish SMS from next-generation sequencing (NGS, or the second-generation sequencing) and Sanger sequencing (the first-generation sequencing). Long reads overcome the limitations of NGS and drastically improves the quality of genome assembly. Besides, ONT also contributes several unique features including ultra-long reads (ULRs) with read length above 300 kb and some close to 1 million bp, direct RNA sequencing and superior portability as made possible by pocket-sized MinION sequencer. Here, we review the history of DNA sequencing technologies and associated applications, with a special focus on the advantages as well as the limitations of ULR sequencing in genome assembly.
Collapse
Affiliation(s)
- Mohit K Midha
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan
| | - Mengchu Wu
- Health GeneTech, 22F No. 99, Xin Pu 6th St., Taoyuan, Taiwan
| | - Kuo-Ping Chiu
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan. .,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan. .,Department of Life Sciences, College of Life Sciences, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
6
|
Maroilley T, Tarailo-Graovac M. Uncovering Missing Heritability in Rare Diseases. Genes (Basel) 2019; 10:E275. [PMID: 30987386 PMCID: PMC6523881 DOI: 10.3390/genes10040275] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 03/29/2019] [Accepted: 04/01/2019] [Indexed: 12/14/2022] Open
Abstract
The problem of 'missing heritability' affects both common and rare diseases hindering: discovery, diagnosis, and patient care. The 'missing heritability' concept has been mainly associated with common and complex diseases where promising modern technological advances, like genome-wide association studies (GWAS), were unable to uncover the complete genetic mechanism of the disease/trait. Although rare diseases (RDs) have low prevalence individually, collectively they are common. Furthermore, multi-level genetic and phenotypic complexity when combined with the individual rarity of these conditions poses an important challenge in the quest to identify causative genetic changes in RD patients. In recent years, high throughput sequencing has accelerated discovery and diagnosis in RDs. However, despite the several-fold increase (from ~10% using traditional to ~40% using genome-wide genetic testing) in finding genetic causes of these diseases in RD patients, as is the case in common diseases-the majority of RDs are also facing the 'missing heritability' problem. This review outlines the key role of high throughput sequencing in uncovering genetics behind RDs, with a particular focus on genome sequencing. We review current advances and challenges of sequencing technologies, bioinformatics approaches, and resources.
Collapse
Affiliation(s)
- Tatiana Maroilley
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada.
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
| | - Maja Tarailo-Graovac
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada.
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
| |
Collapse
|