1
|
Lojova I, Kucharik M, Zatkova A, Balaz A, Pös Z, Tarova ET, Kadasi L, Budis J, Szemes T, Radvanszky J. High-resolution repeat structure analysis in myotonic dystrophy type 2 diagnostics using short-read whole genome sequencing. Anal Biochem 2025; 700:115793. [PMID: 39894140 DOI: 10.1016/j.ab.2025.115793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 01/17/2025] [Accepted: 01/27/2025] [Indexed: 02/04/2025]
Abstract
BACKGROUND/OBJECTIVES Diagnostic possibilities for myotonic dystrophy type 2 (DM2) are constantly evolving in order to achieve more accurate and faster diagnosis. Whole genome sequencing (WGS), together with specialized tandem repeat (TR) genotyping bioinformatic tools, represent a breakthrough technology in molecular diagnostics. We decided to characterize new opportunities and challenges in WGS-based DM2 molecular diagnostics. METHODS WGS data were obtained from 50 individuals, including five DM2 patients, and one individual carrying a premutation range allele. TR characterization was performed using a modified version of the Dante tool, with results validated by conventional PCR and repeat-primed PCR. RESULTS We used WGS to identify all of the expansion-range DM2 alleles, together with the premutation-range allele. Compared to conventional methods, WGS was more efficient for a detailed sequence structure characterization of the normal-range alleles, and phasing of the entire CNBP-complex motif. A 97 % genotyping concordance rate was achieved between the conventional methods and the WGS-derived results, with discrepancies mainly based on single-repeat differences in the genotypes. The stutter effect introduced some uncertainty in both methods. CONCLUSION Short-read WGS offers significant potential for DM2 diagnostics by enabling precise repeat motif characterization and may also apply to other tandem repeat disorders (TRDs).
Collapse
Affiliation(s)
- Ingrid Lojova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia; Comenius University Science Park, Ilkovičova 8, Karlova Ves, 841, 04 Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovičova 6, Karlova Ves, 841 04, Bratislava, Slovakia
| | - Marcel Kucharik
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia; Comenius University Science Park, Ilkovičova 8, Karlova Ves, 841, 04 Bratislava, Slovakia; Geneton Ltd., Ilkovičova 8, 841 04, Bratislava, Slovakia
| | - Andrea Zatkova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia
| | - Andrej Balaz
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia; Geneton Ltd., Ilkovičova 8, 841 04, Bratislava, Slovakia
| | - Zuzana Pös
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia; Geneton Ltd., Ilkovičova 8, 841 04, Bratislava, Slovakia
| | - Eva Tothova Tarova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia
| | - Ludevit Kadasi
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovičova 6, Karlova Ves, 841 04, Bratislava, Slovakia
| | - Jaroslav Budis
- Comenius University Science Park, Ilkovičova 8, Karlova Ves, 841, 04 Bratislava, Slovakia; Geneton Ltd., Ilkovičova 8, 841 04, Bratislava, Slovakia; Genovisio Ltd., Ilkovičova 8, Karlova Ves, 841 04, Bratislava, Slovakia
| | - Tomas Szemes
- Comenius University Science Park, Ilkovičova 8, Karlova Ves, 841, 04 Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovičova 6, Karlova Ves, 841 04, Bratislava, Slovakia; Geneton Ltd., Ilkovičova 8, 841 04, Bratislava, Slovakia
| | - Jan Radvanszky
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Karlova Ves, 845 05, Bratislava, Slovakia; Comenius University Science Park, Ilkovičova 8, Karlova Ves, 841, 04 Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovičova 6, Karlova Ves, 841 04, Bratislava, Slovakia; G2 Consulting Slovakia Ltd., Slnečnicová 559/5, Hviezdoslavov, 930 41, Slovakia.
| |
Collapse
|
2
|
van der Sanden B, Neveling K, Shukor S, Gallagher MD, Lee J, Burke SL, Pennings M, van Beek R, Oorsprong M, Kater-Baats E, Kamping E, Tieleman AA, Voermans NC, Scheffer IE, Gecz J, Corbett MA, Vissers LELM, Pang AWC, Hastie A, Kamsteeg EJ, Hoischen A. Optical genome mapping enables accurate testing of large repeat expansions. Genome Res 2025; 35:810-823. [PMID: 40113266 PMCID: PMC12047237 DOI: 10.1101/gr.279491.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 02/24/2025] [Indexed: 03/22/2025]
Abstract
Short tandem repeats (STRs) are common variations in human genomes that frequently expand or contract, causing genetic disorders, mainly when expanded. Traditional diagnostic methods for identifying these expansions, such as repeat-primed PCR and Southern blotting, are often labor-intensive, locus-specific, and are unable to precisely determine long repeat expansions. Sequencing-based methods, although capable of genome-wide detection, are limited by inaccuracy (short-read technologies) and high associated costs (long-read technologies). This study evaluated optical genome mapping (OGM) as an efficient, accurate approach for measuring STR lengths and assessing somatic stability in 85 samples with known pathogenic repeat expansions in DMPK, CNBP, and RFC1, causing myotonic dystrophy types 1 and 2 and cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS), respectively. Three workflows-manual de novo assembly, local guided assembly (local-GA), and a molecule distance script-were applied, of which the latter two were developed as part of this study to assess the repeat sizes and somatic repeat stability. OGM successfully identified 84/85 (98.8%) of the pathogenic expansions, distinguishing between wild-type and expanded alleles or between two expanded alleles in recessive cases, with greater accuracy than standard of care (SOC) for long repeats and no apparent upper size limit. Notably, OGM detected somatic instability in a subset of DMPK, CNBP, and RFC1 samples. These findings suggest OGM could advance diagnostic accuracy for large repeat expansions, providing a more comprehensive genome-wide assay for repeat expansion disorders by measuring exact repeat lengths and somatic instability across multiple loci simultaneously.
Collapse
Affiliation(s)
- Bart van der Sanden
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Kornelia Neveling
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Syukri Shukor
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Michael D Gallagher
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Joyce Lee
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Stephanie L Burke
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Maartje Pennings
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ronald van Beek
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Michiel Oorsprong
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ellen Kater-Baats
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Eveline Kamping
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Alide A Tieleman
- Department of Neurology, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Nicol C Voermans
- Department of Neurology, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, VIC 3084, Australia
- Department of Pediatrics, University of Melbourne, Royal Children's Hospital, Florey and Murdoch Children's Research Institutes, VIC 3052, Melbourne, Australia
| | - Jozef Gecz
- South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
- Robinson Research Institute and Adelaide Medical School, University of Adelaide, Adelaide, SA 5000, Australia
| | - Mark A Corbett
- Robinson Research Institute and Adelaide Medical School, University of Adelaide, Adelaide, SA 5000, Australia
| | - Lisenka E L M Vissers
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Andy Wing Chun Pang
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Alex Hastie
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands;
| | - Alexander Hoischen
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands;
- Department of Internal Medicine, Radboud Expertise Center for Immunodeficiency and Autoinflammation and Radboud Center for Infectious Disease (RCI), Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| |
Collapse
|
3
|
Li Q, Keskus AG, Wagner J, Izydorczyk MB, Timp W, Sedlazeck FJ, Klein AP, Zook JM, Kolmogorov M, Schatz MC. Unraveling the hidden complexity of cancer through long-read sequencing. Genome Res 2025; 35:599-620. [PMID: 40113261 PMCID: PMC12047254 DOI: 10.1101/gr.280041.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Michal B Izydorczyk
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Alison P Klein
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA;
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| |
Collapse
|
4
|
Mahmoud M, Agustinho DP, Sedlazeck FJ. A Hitchhiker's Guide to long-read genomic analysis. Genome Res 2025; 35:545-558. [PMID: 40228901 PMCID: PMC12047252 DOI: 10.1101/gr.279975.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering the hidden and complex regions of the genome. Significant cost efficiency, scalability, and accuracy advancements have driven this evolution. Concurrently, novel analytical methods have emerged to harness the full potential of long reads. These advancements have enabled milestones such as the first fully completed human genome, enhanced identification and understanding of complex genomic variants, and deeper insights into the interplay between epigenetics and genomic variation. This mini-review provides a comprehensive overview of the latest developments in long-read DNA sequencing analysis, encompassing reference-based and de novo assembly approaches. We explore the entire workflow, from initial data processing to variant calling and annotation, focusing on how these methods improve our ability to interpret a wide array of genomic variants. Additionally, we discuss the current challenges, limitations, and future directions in the field, offering a detailed examination of the state-of-the-art bioinformatics methods for long-read sequencing.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Daniel P Agustinho
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
5
|
Rafehi H, Fearnley LG, Read J, Snell P, Davies KC, Scott L, Gillies G, Thompson GC, Field TA, Eldo A, Bodek S, Butler E, Chen L, Drago J, Goel H, Hackett A, Halmagyi GM, Hannaford A, Kotschet K, Kumar KR, Kumble S, Lee-Archer M, Malhotra A, Paine M, Poon M, Pope K, Reardon K, Ring S, Ronan A, Silsby M, Smyth R, Stutterd C, Wallis M, Waterston J, Wellings T, West K, Wools C, Wu KHC, Szmulewicz DJ, Delatycki MB, Bahlo M, Lockhart PJ. A prospective trial comparing programmable targeted long-read sequencing and short-read genome sequencing for genetic diagnosis of cerebellar ataxia. Genome Res 2025; 35:769-785. [PMID: 40015980 PMCID: PMC12047251 DOI: 10.1101/gr.279634.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 11/21/2024] [Indexed: 03/01/2025]
Abstract
The cerebellar ataxias (CAs) are a heterogeneous group of disorders characterized by progressive incoordination. Seventeen repeat expansion (RE) loci have been identified as the primary genetic cause and account for >80% of genetic diagnoses. Despite this, diagnostic testing is limited and inefficient, often utilizing single gene assays. This study evaluates the effectiveness of long- and short-read sequencing as diagnostic tools for CA. We recruited 110 individuals (48 females, 62 males) with a clinical diagnosis of CA. Short-read genome sequencing (SR-GS) was performed to identify pathogenic RE and also non-RE variants in 356 genes associated with CA. Independently, long-read sequencing with adaptive sampling (LR-AS) was performed to identify pathogenic RE. SR-GS provided a genetic diagnosis for 38% of the cohort (40/110) including seven non-RE pathogenic variants. RE causes disease in 33 individuals, with the most common condition being SCA27B (n = 24). In comparison, LR-AS identified pathogenic RE in 29 individuals. RE identification for the two methods was concordant apart from four SCA27B cases not detected by LR-AS due to low read depth. For both technologies manual review of the RE alignment enhances diagnostic outcomes. Orthogonal testing for SCA27B revealed a 15% and 0% false positive rate for SR-GS and LR-AS, respectively. In conclusion, both technologies are powerful screening tools for CA. SR-GS is a mature technology currently used by diagnostic providers, requiring only minor changes in bioinformatic workflows to enable CA diagnostics. LR-AS offers considerable advantages in the context of RE detection and characterization but requires optimization before clinical implementation.
Collapse
Affiliation(s)
- Haloom Rafehi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Victoria 3052, Australia
| | - Liam G Fearnley
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Victoria 3052, Australia
| | - Justin Read
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
- Department of Neuroscience, Central Clinical School, Monash University, The Alfred Centre, Melbourne, Victoria 3004, Australia
| | - Penny Snell
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Kayli C Davies
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Liam Scott
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - Greta Gillies
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Genevieve C Thompson
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Tess A Field
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Aleena Eldo
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Simon Bodek
- Austin Health, Heidelberg, Victoria 3084, Australia
| | - Ernest Butler
- Monash Medical Centre, Clayton, Victoria 3168, Australia
| | - Luke Chen
- Department of Neurology, Alfred Hospital, Melbourne, Victoria 3004, Australia
| | - John Drago
- Department of Medicine, St Vincent's Hospital, University of Melbourne, Fitzroy, Victoria 3065, Australia
- Florey Institute of Neuroscience and Mental Health, Parkville, Victoria 3052, Australia
| | - Himanshu Goel
- Hunter Genetics, Hunter New England Health Service, Waratah, New South Wales 2298, Australia
| | - Anna Hackett
- Hunter Genetics, Hunter New England Health Service, Waratah, New South Wales 2298, Australia
- University of Newcastle, Callaghan, New South Wales 2308, Australia
| | - G Michael Halmagyi
- Neurology Department, Royal Prince Alfred Hospital, Camperdown, New South Wales 2050, Australia
- Central Clinical School, University of Sydney, Camperdown, New South Wales 2050, Australia
| | - Andrew Hannaford
- Department of Neurology, Westmead Hospital, Hawkesbury Westmead, New South Wales 2145, Australia
- Brain and Nerve Research Centre, Concord Clinical School, University of Sydney, Camperdown, New South Wales 2050, Australia
- Department of Neurology, Concord Repatriation General Hospital, Concord, New South Wales 2139, Australia
| | - Katya Kotschet
- Department of Clinical Neurosciences, St Vincent's Hospital, University of Melbourne, Fitzroy, Victoria 3065, Australia
| | - Kishore R Kumar
- Molecular Medicine Laboratory and Neurology Department, Concord Repatriation General Hospital, Concord, New South Wales 2139, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, New South Wales 2050, Australia
- Genomics and Inherited Disease Program, The Garvan Institute of Medical Research, Darlinghurst, New South Wales 2010, Australia
- School of Medicine, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Smitha Kumble
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
- Department of Clinical Genetics, Austin Health, Viewbank, Victoria 3084, Australia
| | - Matthew Lee-Archer
- Department of Neurology, Launceston General Hospital, Launceston, Tasmania 7250, Australia
| | - Abhishek Malhotra
- Department of Neuroscience, University Hospital Geelong, Geelong, Victoria 3220, Australia
| | - Mark Paine
- Department of Neurology, Royal Brisbane and Women's Hospital, Herston, Queensland 4006, Australia
| | - Michael Poon
- Neurology Footscray, Footscray, Victoria 3011, Australia
| | - Kate Pope
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Katrina Reardon
- Department of Medicine, St Vincent's Hospital, University of Melbourne, Fitzroy, Victoria 3065, Australia
- Department of Neurology, St Vincent's Hospital, University of Melbourne, Fitzroy, Victoria 3065, Australia
| | - Steven Ring
- Albury Wodonga Health, West Albury, New South Wales 2640, Australia
| | - Anne Ronan
- University of Newcastle, Callaghan, New South Wales 2308, Australia
- Newcastle Medical Genetics, Lambton, New South Wales 2299, Australia
| | - Matthew Silsby
- Department of Neurology, Westmead Hospital, Hawkesbury Westmead, New South Wales 2145, Australia
- Brain and Nerve Research Centre, Concord Clinical School, University of Sydney, Camperdown, New South Wales 2050, Australia
- Department of Neurology, Concord Repatriation General Hospital, Concord, New South Wales 2139, Australia
| | - Renee Smyth
- St Vincent's Clinical Genomics, St Vincent's Hospital, Darlinghurst, New South Wales 2010, Australia
| | - Chloe Stutterd
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Mathew Wallis
- Tasmanian Clinical Genetics Service, Tasmanian Health Service, Royal Hobart Hospital, Hobart, Tasmania 7001, Australia
- School of Medicine and Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania 7000, Australia
| | - John Waterston
- Department of Neuroscience, Central Clinical School, Monash University, The Alfred Centre, Melbourne, Victoria 3004, Australia
| | - Thomas Wellings
- Department of Neurology, John Hunter Hospital, New Lambton Heights, New South Wales 2305, Australia
| | - Kirsty West
- Genomic Medicine, The Royal Melbourne Hospital, Parkville, Victoria 3052, Australia
| | - Christine Wools
- Department of Neurology, Calvary Health Care Bethlehem, Caulfield South Victoria 3162, Australia
- Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria 3052, Australia
| | - Kathy H C Wu
- St Vincent's Clinical Genomics, St Vincent's Hospital, Darlinghurst, New South Wales 2010, Australia
- School of Medicine, University of Notre Dame, Darlinghurst, New South Wales 2010, Australia
- Discipline of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Camperdown, New South Wales 2050, Australia
| | - David J Szmulewicz
- Royal Victorian Eye and Ear Hospital, East Melbourne, Victoria 3002, Australia
- Bionics Institute, East Melbourne, Victoria 3002, Australia
| | - Martin B Delatycki
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria 3052, Australia
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Victoria 3052, Australia
| | - Paul J Lockhart
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria 3052, Australia;
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria 3052, Australia
| |
Collapse
|
6
|
Xu IRL, Danzi MC, Raposo J, Züchner S. The continued promise of genomic technologies and software in neurogenetics. J Neuromuscul Dis 2025:22143602251325345. [PMID: 40208247 DOI: 10.1177/22143602251325345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
The continued evolution of genomic technologies over the past few decades has revolutionized the field of neurogenetics, offering profound insights into the genetic underpinnings of neurological disorders. Identification of causal genes for numerous monogenic neurological conditions has informed key aspects of disease mechanisms and facilitated research into critical proteins and molecular pathways, laying the groundwork for therapeutic interventions. However, the question remains: has this transformative trend reached its zenith? In this review, we suggest that despite significant strides in genome sequencing and advanced computational analyses, there is still ample room for methodological refinement. We anticipate further major genetic breakthroughs corresponding with the increased use of long-read genomes, variant calling software, AI tools, and data aggregation databases. Genetic progress has historically been driven by technological advancements from the commercial sector, which are developed in response to academic research needs, creating a continuous cycle of innovation and discovery. This review explores the potential of genomic technologies to address the challenges of neurogenetic disorders. By outlining both established and modern resources, we aim to emphasize the importance of genetic technologies as we enter an era poised for discoveries.
Collapse
Affiliation(s)
- Isaac R L Xu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jacquelyn Raposo
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
7
|
Adam CL, Rocha J, Sudmant P, Rohlfs R. TRACKing tandem repeats: a customizable pipeline for identification and cross-species comparison. BIOINFORMATICS ADVANCES 2025; 5:vbaf066. [PMID: 40351869 PMCID: PMC12064168 DOI: 10.1093/bioadv/vbaf066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 03/14/2025] [Accepted: 04/07/2025] [Indexed: 05/14/2025]
Abstract
Summary TRACK is a user-friendly Snakemake workflow designed to streamline the discovery and comparison of tandem repeats (TRs) across species. TRACK facilitates the cataloging and filtering of TRs based on reference genomes or T2T transcripts, and applies reciprocal LiftOver and sequence alignment methods to identify putative homologous TRs between species. For further analyses, TRACK can be used to genotype TRs and subsequently estimate and plot basic population genetic statistics. By incorporating key functionalities within an integrated workflow, TRACK enhances TR analysis accessibility and reproducibility, while offering flexibility for the user. Availability and implementation The TRACK toolkit with step-by-step tutorial is freely available at https://github.com/caroladam/track.
Collapse
Affiliation(s)
- Carolina L Adam
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, United States
| | - Joana Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Peter Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Rori Rohlfs
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, United States
- School of Computer and Data Sciences, University of Oregon, Eugene, OR 97403, United States
| |
Collapse
|
8
|
Liu Y, Xia K. Aberrant Short Tandem Repeats: Pathogenicity, Mechanisms, Detection, and Roles in Neuropsychiatric Disorders. Genes (Basel) 2025; 16:406. [PMID: 40282366 PMCID: PMC12026680 DOI: 10.3390/genes16040406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/17/2025] [Accepted: 03/19/2025] [Indexed: 04/29/2025] Open
Abstract
Short tandem repeat (STR) sequences are highly variable DNA segments that significantly contribute to human neurodegenerative disorders, highlighting their crucial role in neuropsychiatric conditions. This article examines the pathogenicity of abnormal STRs and classifies tandem repeat expansion disorders(TREDs), emphasizing their genetic characteristics, mechanisms of action, detection methods, and associated animal models. STR expansions exhibit complex genetic patterns that affect the age of onset and symptom severity. These expansions disrupt gene function through mechanisms such as gene silencing, toxic gain-of-function mutations leading to RNA and protein toxicity, and the generation of toxic peptides via repeat-associated non-AUG (RAN) translation. Advances in sequencing technologies-from traditional PCR and Southern blotting to next-generation and long-read sequencing-have enhanced the accuracy of STR variation detection. Research utilizing these technologies has linked STR expansions to a range of neuropsychiatric disorders, including autism spectrum disorders and schizophrenia, highlighting their contribution to disease risk and phenotypic expression through effects on genes involved in neurodevelopment, synaptic function, and neuronal signaling. Therefore, further investigation is essential to elucidate the intricate interplay between STRs and neuropsychiatric diseases, paving the way for improved diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Yuzhong Liu
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| | - Kun Xia
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| |
Collapse
|
9
|
de Boer EN, Scheper AJ, Hendriksen D, Charbon B, van der Vries G, ten Berge AM, Grootscholten PM, Lemmink HH, Jongbloed JDH, Bosscher L, Knoers NVAM, Swertz MA, Sikkema-Raddatz B, Dijkstra DJ, Johansson LF, van Diemen CC. Nanopore Long-Read Sequencing as a First-Tier Diagnostic Test to Detect Repeat Expansions in Neurological Disorders. Int J Mol Sci 2025; 26:2850. [PMID: 40243408 PMCID: PMC11988536 DOI: 10.3390/ijms26072850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 03/14/2025] [Accepted: 03/18/2025] [Indexed: 04/18/2025] Open
Abstract
Inherited neurological disorders, such as spinocerebellar ataxia (SCA) and fragile X (FraX), are frequently caused by short tandem repeat (STR) expansions. The detection and assessment of STRs is important for diagnostics and prognosis. We tested the abilities of nanopore long-read sequencing (LRS) using a custom panel including the nine most common SCA-related genes and FraX and created raw data to report workflow. Using known STR lengths for 23 loci in 12 patients, a pipeline was validated to detect and report STR lengths. In addition, we assessed the capability to detect SNVs, indels, and the methylation status in the same test. For the 23 loci, 22 were concordant with known STR lengths, while for the last, one of three replicates differed, indicating an artefact. All positive control STRs were detected as likely pathogenic, with no additional findings after a visual assessment of repeat motifs. Out of 226 SNV and Indel variants, two were false positive and one false negative (accuracy 98.7%). In all FMR1 controls, a methylation status could be determined. In conclusion, LRS is suitable as a diagnostic workflow for STR analysis in neurological disorders and can be generalized to other diseases. The addition of SNV/Indel and methylation detection promises to allow for a one-test-fits-all workflow.
Collapse
|
10
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 PMCID: PMC11433896 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 12/21/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
11
|
English AC, Dolzhenko E, Ziaei Jam H, McKenzie SK, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol 2025; 43:431-442. [PMID: 38671154 PMCID: PMC11952744 DOI: 10.1038/s41587-024-02225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/28/2024] [Indexed: 04/28/2024]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | | | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | | | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
12
|
Théberge ET, Durbano K, Demailly D, Huby S, Mitina A, Yin Y, Mohajeri A, van Karnebeek C, Horvath GA, Yuen RKC, Usdin K, Lehman A, Cif L, Richmond PA. Disco-Interacting Protein 2 Homolog B CGG Repeat Expansion in Siblings with Neurodevelopmental Disability and Progressive Movement Disorder. Mov Disord 2025; 40:567-578. [PMID: 39854091 DOI: 10.1002/mds.30101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 11/18/2024] [Accepted: 12/13/2024] [Indexed: 01/26/2025] Open
Abstract
BACKGROUND Trinucleotide repeat expansions are an emerging class of genetic variants associated with various movement disorders. Unbiased genome-wide analyses can reveal novel genotype-phenotype associations and provide a diagnosis for patients and families. OBJECTIVE The aim was to identify the genetic cause of a severe progressive movement disorder phenotype in 2 affected brothers. METHODS A family of 2 affected brothers and unaffected parents had extensive phenotyping since birth. Whole-genome and long-read sequencing methods characterized genetic variants and methylation status. RESULTS Two male siblings with a CGG repeat expansion in the 5'-untranslated region (UTR) of disco-interacting protein 2 homolog B (DIP2B) presented with a novel DIP2B phenotype, including neurodevelopmental disability, dysmorphic traits, and a severe progressive movement disorder (chorea, dystonia, and ataxia). CONCLUSIONS This is the first report of a severe progressive movement disorder phenotype associated with a CGG repeat expansion in the DIP2B 5'-UTR. © 2025 International Parkinson and Movement Disorder Society. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Collapse
Affiliation(s)
- Emilie T Théberge
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Kate Durbano
- Department of Neurology, CHU Montpellier, Montpellier, France
| | - Diane Demailly
- Department of Neurology, Clinique Beau Soleil, Institut Mutualiste Montpelliérain, Montpellier, France
| | - Sophie Huby
- Department of Neurology, CHU Montpellier, Montpellier, France
| | - Aleksandra Mitina
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Yue Yin
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Arezoo Mohajeri
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Clara van Karnebeek
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- Emma Center for Personalized Medicine, Departments of Pediatrics and Human Genetics, Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam UMC, Amsterdam, The Netherlands
| | - Gabriella A Horvath
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, British Columbia, Canada
| | - Ryan K C Yuen
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Karen Usdin
- Section on Gene Structure and Disease, Laboratory of Cell and Molecular Biology, National Institute of Diabetes, Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Anna Lehman
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Laura Cif
- Department of Neurosurgery, CHU Montpellier, Montpellier, France
- Service of Neurology, Department of Clinical Neurosciences, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland
| | - Phillip A Richmond
- British Columbia Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
13
|
Jeanjean S, Shen Y, Hardy L, Daunay A, Delépine M, Gerber Z, Alberdi A, Tubacher E, Deleuze JF, How-Kit A. A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers. Nucleic Acids Res 2025; 53:gkaf131. [PMID: 40036507 PMCID: PMC11878640 DOI: 10.1093/nar/gkaf131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 01/13/2025] [Accepted: 02/11/2025] [Indexed: 03/06/2025] Open
Abstract
Microsatellites are short tandem repeats (STRs) of a motif of 1-6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, remain very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. Here, we assessed several second and third-generation sequencing approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard polymerase chain reaction (PCR)-free and PCR-containing, single Unique Molecular Indentifier (UMI) and dual UMI 'duplex sequencing' protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and Oxford Nanopore Technologies long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Collapse
Affiliation(s)
- Sophie I Jeanjean
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Yimin Shen
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Lise M Hardy
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Antoine Daunay
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Marc Delépine
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Zuzana Gerber
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Antonio Alberdi
- Technological Platform of Saint-Louis Research Institute (IRSL), Saint-Louis Hospital, University of Paris, 75010 Paris, France
| | - Emmanuel Tubacher
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Jean-François Deleuze
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Alexandre How-Kit
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| |
Collapse
|
14
|
Doss RM, Lopez-Ignacio S, Dischler A, Hiatt L, Dashnow H, Breuss MW, Dias CM. Mosaicism in Short Tandem Repeat Disorders: A Clinical Perspective. Genes (Basel) 2025; 16:216. [PMID: 40004546 PMCID: PMC11855715 DOI: 10.3390/genes16020216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/06/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
Fragile X, Huntington disease, and myotonic dystrophy type 1 are prototypical examples of human disorders caused by short tandem repeat variation, repetitive nucleotide stretches that are highly mutable both in the germline and somatic tissue. As short tandem repeats are unstable, they can expand, contract, and acquire and lose epigenetic marks in somatic tissue. This means within an individual, the genotype and epigenetic state at these loci can vary considerably from cell to cell. This somatic mosaicism may play a key role in clinical pathogenesis, and yet, our understanding of mosaicism in driving clinical phenotypes in short tandem repeat disorders is only just emerging. This review focuses on these three relatively well-studied examples where, given the advent of new technologies and bioinformatic approaches, a critical role for mosaicism is coming into focus both with respect to cellular physiology and clinical phenotypes.
Collapse
Affiliation(s)
- Rose M. Doss
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Susana Lopez-Ignacio
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Anna Dischler
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Laurel Hiatt
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
| | - Harriet Dashnow
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Martin W. Breuss
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Caroline M. Dias
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Section of Developmental Pediatrics, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
15
|
Lojova I, Kucharik M, Pös Z, Balaz A, Zatkova A, Tothova Tarova E, Budis J, Kadasi L, Szemes T, Radvanszky J. Advancing molecular diagnostics of myotonic dystrophy type 1 using short-read whole genome sequencing. Mol Cell Probes 2025; 79:102005. [PMID: 39710066 DOI: 10.1016/j.mcp.2024.102005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 12/20/2024] [Accepted: 12/20/2024] [Indexed: 12/24/2024]
Abstract
Myotonic dystrophy type 1 (DM1) is a serious multisystem disorder caused by GCA repeat expansions in the DMPK gene. Early and accurate diagnosis, often requiring reliable DNA-diagnostic techniques, is critical for preventing life-threatening cardiac complications. Clinically, two main diagnostic challenges exist. Firstly, because of overlapping symptomatology with other conditions, conventional DNA-testing methods focusing on DM1 expansion detection ensure diagnostic results only in a small subset of patients, and frequently, further DNA-testing in remaining cases is necessary. Secondly, because of variable symptomatology and age of onset, not all DM1 patients are referred for DM1 genetic testing, leading to unrecognized but at-risk cases. When using conventional methods, the main technical problems are expanded-allele sizing and sensitivity to the presence of sequence interruptions. On a set of 50 individual genomes, including ten DM1 patients, we tested the performance of short-read whole-genome sequencing (WGS), one of the most up-to-date molecular testing methods. We identified all expansion-range DM1 alleles and characterized sequence interruptions in seven expansion-range/premutation-range alleles. Although neither the tested conventional methods, nor WGS allowed expanded-allele sizing, conventional methods provided higher sizing limits for normal-range alleles. Genotyping concordance rate was found to be 95-99 %. WGS was found to be superior in elucidating the sequence structure of the motifs, even if they fall outside the sizing limit (from partial reads). In addition, WGS enables the identification of genetic modifiers in other genes and the detection of alternative diagnoses in DM1-negative patients by extension of the bioinformatic evaluation of the generated data.
Collapse
Affiliation(s)
- Ingrid Lojova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Comenius University Science Park, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
| | - Marcel Kucharik
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Comenius University Science Park, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia
| | - Zuzana Pös
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia
| | - Andrej Balaz
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia; Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| | - Andrea Zatkova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia
| | - Eva Tothova Tarova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Department of Biology, Faculty of Education, J. Selye University, Komárno, Slovakia
| | - Jaroslav Budis
- Comenius University Science Park, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia; Genovisio Ltd., Bratislava, Slovakia
| | - Ludevit Kadasi
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
| | - Tomas Szemes
- Comenius University Science Park, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia
| | - Jan Radvanszky
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Comenius University Science Park, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia; G2 Consulting Slovakia Ltd., Slovakia.
| |
Collapse
|
16
|
Li R, Chu H, Gao K, Luo H, Jiang Y. SUMMER: an integrated nanopore sequencing pipeline for variants detection and clinical annotation on the human genome. Funct Integr Genomics 2025; 25:21. [PMID: 39836277 PMCID: PMC11750885 DOI: 10.1007/s10142-025-01534-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/09/2025] [Accepted: 01/11/2025] [Indexed: 01/22/2025]
Abstract
Long-read sequencing has emerged as a transformative technology in recent years, offering significant potential for the molecular diagnosis of unresolved genetic disorders. Despite its promise, the comprehensive detection and clinical annotation of genomic variants remain intricate and technically demanding. We present SUMMER, an integrated and structured workflow specifically designed to process raw Nanopore sequencing reads. SUMMER facilitates an in-depth analysis of multiple variant types, including SNV, SV, short tandem repeat and mobile element insertion. For clinical applications, SUMMER employs SvAnna to prioritize SV candidates based on phenotype relevance and utilizes Straglr to provide reference distributions of non-pathogenic unit counts for 55 known pathogenic short tandem repeats. By addressing critical challenges in variant detection and annotation, SUMMER seeks to advance the clinical utility of long-read sequencing in diagnostic genomics. SUMMER is available on the web at https://github.com/carolhuaxia/summer .
Collapse
Affiliation(s)
- Renqiuguo Li
- Children's Medical Center, Peking University First Hospital, No.5 Le Yuan Road, Daxing District, 100034, Beijing, China
| | - Hongyuan Chu
- Children's Medical Center, Peking University First Hospital, No.5 Le Yuan Road, Daxing District, 100034, Beijing, China
| | - Kai Gao
- Children's Medical Center, Peking University First Hospital, No.5 Le Yuan Road, Daxing District, 100034, Beijing, China
| | - Huaxia Luo
- Children's Medical Center, Peking University First Hospital, No.5 Le Yuan Road, Daxing District, 100034, Beijing, China.
| | - Yuwu Jiang
- Children's Medical Center, Peking University First Hospital, No.5 Le Yuan Road, Daxing District, 100034, Beijing, China.
| |
Collapse
|
17
|
Van Deynze K, Mumm C, Maltby CJ, Switzenberg JA, Todd P, Boyle AP. Enhanced detection and genotyping of disease-associated tandem repeats using HMMSTR and targeted long-read sequencing. Nucleic Acids Res 2025; 53:gkae1202. [PMID: 39676678 PMCID: PMC11754662 DOI: 10.1093/nar/gkae1202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 10/16/2024] [Accepted: 11/19/2024] [Indexed: 12/17/2024] Open
Abstract
Tandem repeat sequences comprise approximately 8% of the human genome and are linked to more than 50 neurodegenerative disorders. Accurate characterization of disease-associated repeat loci remains resource intensive and often lacks high resolution genotype calls. We introduce a multiplexed, targeted nanopore sequencing panel and HMMSTR, a sequence-based tandem repeat copy number caller which outperforms current signal- and sequence-based callers relative to two assemblies and we show it performs with high accuracy in heterozygous regions and at low read coverage. The flexible panel allows us to capture disease associated regions at an average coverage of >150x. Using these tools, we successfully characterize known or suspected repeat expansions in patient derived samples. In these samples, we also identify unexpected expanded alleles at tandem repeat loci not previously associated with the underlying diagnosis. This genotyping approach for tandem repeat expansions is scalable, simple, flexible and accurate, offering significant potential for diagnostic applications and investigation of expansion co-occurrence in neurodegenerative disorders.
Collapse
Affiliation(s)
- Kinsey Van Deynze
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Camille Mumm
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Connor J Maltby
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jessica A Switzenberg
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter K Todd
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA
- Ann Arbor Veterans Administration Healthcare, Ann Arbor, MI 48105, USA
| | - Alan P Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
18
|
Maestri S, Scalzo D, Damaggio G, Zobel M, Besusso D, Cattaneo E. Navigating triplet repeats sequencing: concepts, methodological challenges and perspective for Huntington's disease. Nucleic Acids Res 2025; 53:gkae1155. [PMID: 39676657 PMCID: PMC11724279 DOI: 10.1093/nar/gkae1155] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 10/16/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024] Open
Abstract
The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
Collapse
Affiliation(s)
- Simone Maestri
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Davide Scalzo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Gianluca Damaggio
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Martina Zobel
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Dario Besusso
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Elena Cattaneo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| |
Collapse
|
19
|
Park G, An H, Luo H, Park J. NanoMnT: an STR analysis tool for Oxford Nanopore sequencing data driven by a comprehensive analysis of error profile in STR regions. Gigascience 2025; 14:giaf013. [PMID: 40094553 PMCID: PMC11912559 DOI: 10.1093/gigascience/giaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 12/29/2024] [Accepted: 02/02/2025] [Indexed: 03/19/2025] Open
Abstract
Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.
Collapse
Affiliation(s)
- Gyumin Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| | - Hyunsu An
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| | - Han Luo
- Department of Thyroid and Parathyroid Surgery, Laboratory of thyroid and parathyroid disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 61005, China
| | - Jihwan Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| |
Collapse
|
20
|
Berthold N, Gaudieri S, Hood S, Tschochner M, Miller AL, Jordan J, Thornton LM, Bulik CM, Akkari PA, Kennedy MA. Nanopore sequencing as a novel method of characterising anorexia nervosa risk loci. BMC Genomics 2024; 25:1262. [PMID: 39741260 DOI: 10.1186/s12864-024-11172-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 12/19/2024] [Indexed: 01/02/2025] Open
Abstract
BACKGROUND Anorexia nervosa (AN) is a polygenic, severe metabopsychiatric disorder with poorly understood aetiology. Eight significant loci have been identified by genome-wide association studies (GWAS) and single nucleotide polymorphism (SNP)-based heritability was estimated to be ~ 11-17, yet causal variants remain elusive. It is therefore important to define the full spectrum of genetic variants in the wider regions surrounding these significantly associated loci. The hypothesis we evaluate here is that unrecognised or relatively unexplored variants in these regions exist and are promising targets for future functional analyses. To test this hypothesis, we implemented a novel approach with targeted nanopore sequencing (Oxford Nanopore Technologies) for 200 kb regions centred on each of the eight AN-associated loci in 10 AN case samples. Our bioinformatics pipeline entailed base-calling and alignment with Dorado and minimap2 software, followed by variant calling with four separate tools, Sniffles2, Clair3, Straglr, and NanoVar. We then leveraged publicly available databases to characterise these loci in putative functional context and prioritise a subset of potentially relevant variants. RESULTS Targeted nanopore sequencing effectively enriched the target regions (average coverage 14.64x). To test our hypothesis, we curated a list of 20 prioritised variants in non-coding regions, poorly represented in the current human reference genome but that may have functional consequences in AN pathology. Notably, we identified a polymorphic SINE-VNTR-Alu like sub-family D element (SVA-D), intergenic with IP6K2 and PRKAR2A, and a poly-T short tandem repeat (STR) in the 3'UTR of FOXP1. CONCLUSIONS Our results highlight the potential of targeted nanopore sequencing for characterising poorly resolved or complex variation, which may be initially obscured in risk-associated regions detected by GWAS. Some of the variants identified in this way, such as the polymorphic SVA-D and poly-T STR, could contribute to mechanisms of phenotypic risk, through regulation of several neighbouring genes implicated in AN biology, and affect post-transcriptional processing of FOXP1, respectively. This exploratory investigation was not powered to detect functional effects, however, the variants we observed using this method are poorly represented in the current human reference genome and accompanying databases, and further examination of these may provide new opportunities for improved understanding of genetic risk mechanisms of AN.
Collapse
Affiliation(s)
- Natasha Berthold
- University of Western Australia, Crawley, WA, Australia.
- Perron Research Institute, Nedlands, WA, Australia.
- Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand.
| | - Silvana Gaudieri
- University of Western Australia, Crawley, WA, Australia
- Murdoch University, Murdoch, WA, Australia
- Vanderbilt University Medical Centre, Nashville, TN, USA
| | - Sean Hood
- University of Western Australia, Crawley, WA, Australia
| | | | - Allison L Miller
- Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand
| | - Jennifer Jordan
- Department of Psychological Medicine, University of Otago Christchurch, Christchurch, New Zealand
| | - Laura M Thornton
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Cynthia M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Patrick Anthony Akkari
- University of Western Australia, Crawley, WA, Australia
- Perron Research Institute, Nedlands, WA, Australia
- Murdoch University, Murdoch, WA, Australia
- Duke University, Durham, NC, USA
| | - Martin A Kennedy
- Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand
| |
Collapse
|
21
|
Lamba J, Marchi F, Landwehr M, Schade AK, Shastri V, Ghavami M, Sckaff F, Marrero R, Nguyen N, Mansinghka V, Cao X, Slayton W, Starostik P, Ribeiro R, Rubnitz J, Klco J, Gamis A, Triche T, Ries R, Kolb EA, Aplenc R, Alonzo T, Pounds S, Meshinchi S, Cogle C, Elsayed A. Long-read epigenomic diagnosis and prognosis of Acute Myeloid Leukemia. RESEARCH SQUARE 2024:rs.3.rs-5450972. [PMID: 39711573 PMCID: PMC11661290 DOI: 10.21203/rs.3.rs-5450972/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Acute Myeloid Leukemia (AML) is an aggressive cancer with dismal outcomes, vast subtype heterogeneity, and suboptimal risk stratification. In this study, we harmonized DNA methylation data from 3,314 patients across 11 cohorts to develop the Acute Leukemia Methylome Atlas (ALMA) of diagnostic relevance that predicted 27 WHO 2022 acute leukemia subtypes with an overall accuracy of 96.3% in discovery and 90.1% in validation cohorts. Specifically, for AML, we also developed AML Epigenomic Risk, a prognostic classifier of overall survival (OS) (HR=4.40; 95% CI=3.45-5.61; P<0.0001), and a targeted 38CpG AML signature using a stepwise EWAS-CoxPH-LASSO model predictive of OS (HR=3.84; 95% CI=3.01-4.91; P<0.0001). Finally, we developed a specimen-to-result protocol for simultaneous whole-genome and epigenome sequencing that accurately predicted diagnoses and prognoses from twelve prospectively collected patient samples using long-read sequencing. Our study unveils a new paradigm in acute leukemia management by leveraging DNA methylation for diagnostic and prognostic applications.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Xueyuan Cao
- University of Tennessee Health Science Center
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Ahn JH, Yoon JG, Cho J, Lee S, Kim S, Kim MJ, Kim SY, Lee ST, Chu K, Lee SK, Kim HJ, Youn J, Jang JH, Chae JH, Moon J, Cho JW. Implementing genomic medicine in clinical practice for adults with undiagnosed rare diseases. NPJ Genom Med 2024; 9:63. [PMID: 39609445 PMCID: PMC11604660 DOI: 10.1038/s41525-024-00449-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 11/18/2024] [Indexed: 11/30/2024] Open
Abstract
The global burden of undiagnosed diseases, particularly in adults, is rising due to their significant socioeconomic impact. To address this, we enrolled 232 adult probands with undiagnosed conditions, utilizing bioinformatics tools for genetic analysis. Alongside exome and genome sequencing, repeat-primed PCR and Cas9-mediated nanopore sequencing were applied to suspected short tandem repeat disorders. Probands were classified into probable genetic (n = 128) or uncertain (n = 104) origins. The study found genetic causes in 66 individuals (28.4%) and non-genetic causes in 12 (5.2%), with a longer diagnostic journey for those in the probable genetic group or with pediatric symptom onset, emphasizing the need for increased efforts in these populations. Genetic diagnoses facilitated effective surveillance, cascade screening, drug repurposing, and pregnancy planning. This study demonstrates that integrating sequencing technologies improves diagnostic accuracy, may shorten the time to diagnosis, and enhances personalized management for adults with undiagnosed diseases.
Collapse
Affiliation(s)
- Jong Hyeon Ahn
- Department of Neurology, Samsung Medical Centre, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Jihoon G Yoon
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Laboratory Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jaeso Cho
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Seungbok Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University Children's Hospital, Seoul, Republic of Korea
| | - Sheehyun Kim
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Man Jin Kim
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Soo Yeon Kim
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University Children's Hospital, Seoul, Republic of Korea
| | - Soon-Tae Lee
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Kon Chu
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Sang Kun Lee
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Han-Joon Kim
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jinyoung Youn
- Department of Neurology, Samsung Medical Centre, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Ja-Hyun Jang
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jong-Hee Chae
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University Children's Hospital, Seoul, Republic of Korea
| | - Jangsup Moon
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea.
| | - Jin Whan Cho
- Department of Neurology, Samsung Medical Centre, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea.
| |
Collapse
|
23
|
Tesi N, Salazar A, Zhang Y, van der Lee S, Hulsman M, Knoop L, Wijesekera S, Krizova J, Schneider AF, Pennings M, Sleegers K, Kamsteeg EJ, Reinders M, Holstege H. Characterizing tandem repeat complexities across long-read sequencing platforms with TREAT and otter. Genome Res 2024; 34:1942-1953. [PMID: 39406499 DOI: 10.1101/gr.279351.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/03/2024] [Indexed: 11/09/2024]
Abstract
Tandem repeats (TRs) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs; however, the underlying bioinformatics perspectives remain challenging. We present otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization, and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and Pacific Bioscience (PacBio, Sequel II and Revio), otter and TREAT achieve state-of-the-art genotyping and motif characterization accuracy. Applied to clinically relevant TRs, TREAT/otter significantly identify individuals with pathogenic TR expansions. When applied to a case-control setting, we replicate previously reported associations of TRs with Alzheimer's disease, including those near or within APOC1 (P = 2.63 × 10-9), SPI1 (P = 6.5 × 10-3), and ABCA7 (P = 0.04) genes. Finally, we use TREAT/otter to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing data sets. We show that, in rare cases (0.06%), long-read sequencing from coverage drops in TRs, including the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TRs across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TRs in human genomes, with broad applications in research and clinical fields.
Collapse
Affiliation(s)
- Niccoló Tesi
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands;
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Alex Salazar
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Yaran Zhang
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sven van der Lee
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Marc Hulsman
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Lydian Knoop
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sanduni Wijesekera
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Jana Krizova
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Anne-Fleur Schneider
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Maartje Pennings
- Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Kristel Sleegers
- Complex Genetics of Alzheimer's Disease Group, Antwerp Center for Molecular Neurology, VIB, Antwerp B-2650, Belgium
| | - Erik-Jan Kamsteeg
- Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Henne Holstege
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| |
Collapse
|
24
|
De Coster W, Höijer I, Bruggeman I, D'Hert S, Melin M, Ameur A, Rademakers R. Visualization and analysis of medically relevant tandem repeats in nanopore sequencing of control cohorts with pathSTR. Genome Res 2024; 34:2074-2080. [PMID: 39147583 DOI: 10.1101/gr.279265.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The lack of population-scale databases hampers research and diagnostics for medically relevant tandem repeats and repeat expansions. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in a healthy population. The current version includes 1040 individuals of The 1000 Genomes Project cohort sequenced on the Oxford Nanopore Technologies PromethION. A comprehensive set of medically relevant tandem repeats has been genotyped using STRdust and LongTR to determine the tandem repeat length and sequence composition. PathSTR provides rich visualizations of this data set and the feature to upload one's data for comparison along the control cohort. We demonstrate the implementation of this application using data from targeted nanopore sequencing of a patient with myotonic dystrophy type 1. This resource will empower the genetics community to get a more complete overview of normal variation in tandem repeat length and sequence composition and, as such, enable a better assessment of rare tandem repeat alleles observed in patients.
Collapse
Affiliation(s)
- Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium;
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Ida Höijer
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, 751 85 Uppsala, Sweden
| | - Inge Bruggeman
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Svenn D'Hert
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Malin Melin
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, 751 85 Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, 751 85 Uppsala, Sweden
| | - Rosa Rademakers
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| |
Collapse
|
25
|
Nyaga DM, Tsai P, Gebbie C, Phua HH, Yap P, Le Quesne Stabej P, Farrow S, Rong J, Toldi G, Thorstensen E, Stark Z, Lunke S, Gamet K, Van Dyk J, Greenslade M, O'Sullivan JM. Benchmarking nanopore sequencing and rapid genomics feasibility: validation at a quaternary hospital in New Zealand. NPJ Genom Med 2024; 9:57. [PMID: 39516456 PMCID: PMC11549486 DOI: 10.1038/s41525-024-00445-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Approximately 200 critically ill infants and children in New Zealand are in high-dependency care, many suspected of having genetic conditions, requiring scalable genomic testing. We adopted an acute care genomics protocol from an accredited laboratory and established a clinical pipeline using Oxford Nanopore Technologies PromethION 2 solo system and Fabric GEM™ software. Benchmarking of the pipeline was performed using Global Alliance for Genomics and Health benchmarking tools and Genome in a Bottle samples (HG002-HG007). Evaluation of single nucleotide variants resulted in a precision and recall of 0.997 and 0.992, respectively. Small indel identification approached a precision of 0.922 and recall of 0.838. Large genomic variations from Coriell Copy Number Variation Reference Panel 1 were reliably detected with ~2 M long reads. Finally, we present results obtained from fourteen trio samples, ten of which were processed in parallel with a clinically accredited short-read rapid genomic testing pipeline (Newborn Genomics Programme; NCT06081075; 2023-10-12).
Collapse
Affiliation(s)
- Denis M Nyaga
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Peter Tsai
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- Molecular Medicine and Pathology, The University of Auckland, Auckland, New Zealand
| | - Clare Gebbie
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Hui Hui Phua
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Patrick Yap
- Genetic Health Service New Zealand-Northern Hub, Te Toka Tumai, Auckland, New Zealand
| | - Polona Le Quesne Stabej
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- Molecular Medicine and Pathology, The University of Auckland, Auckland, New Zealand
| | - Sophie Farrow
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Jing Rong
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Gergely Toldi
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- Starship Child Health, Te Whatu Ora Te Toka Tumai, Auckland, New Zealand
| | - Eric Thorstensen
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Zornitza Stark
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Parkville, Melbourne, Australia
- Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Sebastian Lunke
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Parkville, Melbourne, Australia
- Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Kimberley Gamet
- Genetic Health Service New Zealand-Northern Hub, Te Toka Tumai, Auckland, New Zealand
| | - Jodi Van Dyk
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Mark Greenslade
- Diagnostic Genetics, Department of Pathology and Laboratory Medicine, Te Toka Tumai, Auckland, New Zealand
| | | |
Collapse
|
26
|
Bonfiglio F, Legati A, Lasorsa VA, Palombo F, De Riso G, Isidori F, Russo S, Furini S, Merla G, Coppedè F, Tartaglia M, Bruselles A, Pippucci T, Ciolfi A, Pinelli M, Capasso M. Best practices for germline variant and DNA methylation analysis of second- and third-generation sequencing data. Hum Genomics 2024; 18:120. [PMID: 39501379 PMCID: PMC11536923 DOI: 10.1186/s40246-024-00684-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/11/2024] [Indexed: 11/09/2024] Open
Abstract
This comprehensive review provides insights and suggested strategies for the analysis of germline variants using second- and third-generation sequencing technologies (SGS and TGS). It addresses the critical stages of data processing, starting from alignment and preprocessing to quality control, variant calling, and the removal of artifacts. The document emphasized the importance of meticulous data handling, highlighting advanced methodologies for annotating variants and identifying structural variations and methylated DNA sites. Special attention is given to the inspection of problematic variants, a step that is crucial for ensuring the accuracy of the analysis, particularly in clinical settings where genetic diagnostics can inform patient care. Additionally, the document covers the use of various bioinformatics tools and software that enhance the precision and reliability of these analyses. It outlines best practices for the annotation of variants, including considerations for problematic genetic alterations such as those in the human leukocyte antigen region, runs of homozygosity, and mitochondrial DNA alterations. The document also explores the complexities associated with identifying structural variants and copy number variations, underscoring the challenges posed by these large-scale genomic alterations. The objective is to offer a comprehensive framework for researchers and clinicians, ensuring that genetic analyses conducted with SGS and TGS are both accurate and reproducible. By following these best practices, the document aims to increase the diagnostic accuracy for hereditary diseases, facilitating early diagnosis, prevention, and personalized treatment strategies. This review serves as a valuable resource for both novices and experts in the field, providing insights into the latest advancements and methodologies in genetic analysis. It also aims to encourage the adoption of these practices in diverse research and clinical contexts, promoting consistency and reliability across studies.
Collapse
Affiliation(s)
- Ferdinando Bonfiglio
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy
- CEINGE Advanced Biotechnology Franco Salvatore, Naples, Italy
| | - Andrea Legati
- Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | | | - Flavia Palombo
- Programma Di Neurogenetica, IRCCS Istituto Delle Scienze Neurologiche Di Bologna, Bologna, Italy
| | - Giulia De Riso
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy
- CEINGE Advanced Biotechnology Franco Salvatore, Naples, Italy
| | - Federica Isidori
- IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy
| | - Silvia Russo
- Research Laboratory of Medical Cytogenetics and Molecular Genetics, IRCCS Istituto Auxologico Italiano, Milan, Italy
- Laboratorio di Ricerca di Citogenetica Medica e Genetica Molecolare, Istituto Auxologico Italiano, IRCCS, 20145, Milano, Italy
| | - Simone Furini
- Department of Electrical, Electronic and Information Engineering "Guglielmo Marconi", University of Bologna, Bologna, Italy
| | - Giuseppe Merla
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy
| | - Fabio Coppedè
- Department of Translational Research and of New Surgical and Medical Technologies, University of Pisa, Pisa, Italy
| | - Marco Tartaglia
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular Medicine, Istituto Superiore Di Sanità, Rome, Italy
| | - Tommaso Pippucci
- IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy
| | - Andrea Ciolfi
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy
- CEINGE Advanced Biotechnology Franco Salvatore, Naples, Italy
| | - Mario Capasso
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy.
- CEINGE Advanced Biotechnology Franco Salvatore, Naples, Italy.
| |
Collapse
|
27
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 PMCID: PMC11921810 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
28
|
Zheng Y, Shang X. FindCSV: a long-read based method for detecting complex structural variations. BMC Bioinformatics 2024; 25:315. [PMID: 39342151 PMCID: PMC11439270 DOI: 10.1186/s12859-024-05937-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 09/18/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
29
|
Zheng ZH, Cao CY, Cheng B, Yuan RY, Zeng YH, Guo ZB, Qiu YS, Lv WQ, Liang H, Li JL, Zhang WX, Fang MK, Sun YH, Lin W, Hong JM, Gan SR, Wang N, Chen WJ, Du GQ, Fang L. Characteristics of tandem repeat inheritance and sympathetic nerve involvement in GAA-FGF14 ataxia. J Hum Genet 2024; 69:433-440. [PMID: 38866925 DOI: 10.1038/s10038-024-01262-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/12/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
BACKGROUND Intronic GAA repeat expansion ([GAA] ≥250) in FGF14 is associated with the late-onset neurodegenerative disorder, spinocerebellar ataxia 27B (SCA27B, GAA-FGF14 ataxia). We aim to determine the prevalence of the GAA repeat expansion in FGF14 in Chinese populations presenting late-onset cerebellar ataxia (LOCA) and evaluate the characteristics of tandem repeat inheritance, radiological features and sympathetic nerve involvement. METHODS GAA-FGF14 repeat expansion was screened in an undiagnosed LOCA cohort (n = 664) and variations in repeat-length were analyzed in families of confirmed GAA-FGF14 ataxia patients. Brain magnetic resonance imaging (MRI) was used to evaluate the radiological feature in GAA-FGF14 ataxia patients. Clinical examinations and sympathetic skin response (SSR) recordings in GAA-FGF14 patients (n = 16) were used to quantify sympathetic nerve involvement. RESULTS Two unrelated probands (2/664) were identified. Genetic screening for GAA-FGF14 repeat expansion was performed in 39 family members, 16 of whom were genetically diagnosed with GAA-FGF14 ataxia. Familial screening revealed expansion of GAA repeats in maternal transmissions, but contraction upon paternal transmission. Brain MRI showed slight to moderate cerebellar atrophy. SSR amplitude was lower in GAA-FGF14 patients in pre-symptomatic stage compared to healthy controls, and further decreased in the symptomatic stage. CONCLUSIONS GAA-FGF14 ataxia was rare among Chinese LOCA cases. Parental gender appears to affect variability in GAA repeat number between generations. Reduced SSR amplitude is a prominent feature in GAA-FGF14 patients, even in the pre-symptomatic stage.
Collapse
Affiliation(s)
- Ze-Hong Zheng
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Chun-Yan Cao
- The First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, 471003, China
| | - Bi Cheng
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Ru-Ying Yuan
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Yi-Heng Zeng
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Zhang-Bao Guo
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Yu-Sen Qiu
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Wen-Qi Lv
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Hui Liang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Jin-Lan Li
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Wei-Xiong Zhang
- The First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, 471003, China
| | - Min-Kun Fang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Yu-Hao Sun
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Wei Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Jing-Mei Hong
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Shi-Rui Gan
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Ning Wang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Wan-Jin Chen
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China
| | - Gan-Qin Du
- The First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, 471003, China.
| | - Ling Fang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, 350005, China.
| |
Collapse
|
30
|
Lewis SA, Ruttenberg A, Iyiyol T, Kong N, Jin SC, Kruer MC. Potential clinical applications of advanced genomic analysis in cerebral palsy. EBioMedicine 2024; 106:105229. [PMID: 38970919 PMCID: PMC11282942 DOI: 10.1016/j.ebiom.2024.105229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/26/2024] [Accepted: 06/20/2024] [Indexed: 07/08/2024] Open
Abstract
Cerebral palsy (CP) has historically been attributed to acquired insults, but emerging research suggests that genetic variations are also important causes of CP. While microarray and whole-exome sequencing based studies have been the primary methods for establishing new CP-gene relationships and providing a genetic etiology for individual patients, the cause of their condition remains unknown for many patients with CP. Recent advancements in genomic technologies offer additional opportunities to uncover variations in human genomes, transcriptomes, and epigenomes that have previously escaped detection. In this review, we outline the use of these state-of-the-art technologies to address the molecular diagnostic challenges experienced by individuals with CP. We also explore the importance of identifying a molecular etiology whenever possible, given the potential for genomic medicine to provide opportunities to treat patients with CP in new and more precise ways.
Collapse
Affiliation(s)
- Sara A Lewis
- Pediatric Movement Disorders Program, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ, United States; Departments of Child Health, Neurology, and Cellular & Molecular Medicine and Program in Genetics, University of Arizona College of Medicine, Phoenix, AZ, United States
| | - Andrew Ruttenberg
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Tuğçe Iyiyol
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Nahyun Kong
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Sheng Chih Jin
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States; Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, United States.
| | - Michael C Kruer
- Pediatric Movement Disorders Program, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ, United States; Departments of Child Health, Neurology, and Cellular & Molecular Medicine and Program in Genetics, University of Arizona College of Medicine, Phoenix, AZ, United States; Programs in Neuroscience and Molecular & Cellular Biology, School of Life Sciences, Arizona State University, Tempe, AZ, United States.
| |
Collapse
|
31
|
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads. Genome Biol 2024; 25:176. [PMID: 38965568 PMCID: PMC11229021 DOI: 10.1186/s13059-024-03319-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 06/21/2024] [Indexed: 07/06/2024] Open
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
32
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
33
|
Rajan-Babu IS, Dolzhenko E, Eberle MA, Friedman JM. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 2024; 25:476-499. [PMID: 38467784 DOI: 10.1038/s41576-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/13/2024]
Abstract
Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities. Here, we review the diverse structural conformations of repeat expansions, technological advances for the characterization of changes in sequence composition, their clinical correlations and the impact on disease mechanisms.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada.
| | | | | | - Jan M Friedman
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada
- BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
34
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. A comprehensive tandem repeat catalog of the human genome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309173. [PMID: 38947075 PMCID: PMC11213036 DOI: 10.1101/2024.06.19.24309173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- BC Children's Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
35
|
Lee S, Yoon JG, Hong J, Kim T, Kim N, Vandrovcova J, Yau WY, Cho J, Kim S, Kim MJ, Kim SY, Lee ST, Chu K, Lee SK, Kim HJ, Choi J, Moon J, Chae JH. Prevalence and Characterization of NOTCH2NLC GGC Repeat Expansions in Koreans: From a Hospital Cohort Analysis to a Population-Wide Study. Neurol Genet 2024; 10:e200147. [PMID: 38779172 PMCID: PMC11110025 DOI: 10.1212/nxg.0000000000200147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/16/2024] [Indexed: 05/25/2024]
Abstract
Background and Objectives GGC repeat expansions in the NOTCH2NLC gene are associated with a broad spectrum of progressive neurologic disorders, notably, neuronal intranuclear inclusion disease (NIID). We aimed to investigate the population-wide prevalence and clinical manifestations of NOTCH2NLC-related disorders in Koreans. Methods We conducted a study using 2 different cohorts from the Korean population. Patients with available brain MRI scans from Seoul National University Hospital (SNUH) were thoroughly reviewed, and NIID-suspected patients presenting the zigzag edging signs underwent genetic evaluation for NOTCH2NLC repeats by Cas9-mediated nanopore sequencing. In addition, we analyzed whole-genome sequencing data from 3,887 individuals in the Korea Biobank cohort to estimate the distribution of the repeat counts in Koreans and to identify putative patients with expanded alleles and neurologic phenotypes. Results In the SNUH cohort, among 90 adult-onset leukoencephalopathy patients with unknown etiologies, we found 20 patients with zigzag edging signs. Except for 2 diagnosed with fragile X-associated tremor/ataxia syndrome and 2 with unavailable samples, all 16 patients (17.8%) were diagnosed with NIID (repeat range: 87-217). By analyzing the Korea Biobank cohort, we estimated the distribution of repeat counts and threshold (>64) for Koreans, identifying 6 potential patients with NIID. Furthermore, long-read sequencing enabled the elucidation of transmission and epigenetic patterns of NOTCH2NLC repeats within a family affected by pediatric-onset NIID. Discussion This study presents the population-wide distribution of NOTCH2NLC repeats and the estimated prevalence of NIID in Koreans, providing valuable insights into the association between repeat counts and disease manifestations in diverse neurologic disorders.
Collapse
Affiliation(s)
| | | | | | - Taekeun Kim
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Narae Kim
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Jana Vandrovcova
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Wai Yan Yau
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Jaeso Cho
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Sheehyun Kim
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Man Jin Kim
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Soo Yeon Kim
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Soon-Tae Lee
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Kon Chu
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Sang Kun Lee
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | - Han-Joon Kim
- From the Department of Genomic Medicine (S.L., J.G.Y., Jaeso Cho, S.K., M.J.K., S.Y.K., J.M., J.-H.C.), Seoul National University Hospital; Department of Pediatrics (S.L., Jaeso Cho, S.Y.K., J.-H.C.), Seoul National University College of Medicine, Seoul National University Children's Hospital; Department of Biomedical Sciences (J.H., T.K., Jungmin Choi), Korea University College of Medicine; Department of Neurology (N.K., S.-T.L., K.C., S.K.L., H.-J.K., J.M.), Seoul National University Hospital, Korea; Department of Neuromuscular Diseases (J.V.), Institute of Neurology, University College London, United Kingdom; Perron Institute for Neurological and Translational Science (W.Y.Y.), the University of Western Australia, Nedlands, Australia; and Department of Laboratory Medicine (M.J.K.), Seoul National University Hospital, Korea
| | | | | | | |
Collapse
|
36
|
Van Deynze K, Mumm C, Maltby CJ, Switzenberg JA, Todd PK, Boyle AP. Enhanced Detection and Genotyping of Disease-Associated Tandem Repeats Using HMMSTR and Targeted Long-Read Sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.01.24306681. [PMID: 38746091 PMCID: PMC11092683 DOI: 10.1101/2024.05.01.24306681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Tandem repeat sequences comprise approximately 8% of the human genome and are linked to more than 50 neurodegenerative disorders. Accurate characterization of disease-associated repeat loci remains resource intensive and often lacks high resolution genotype calls. We introduce a multiplexed, targeted nanopore sequencing panel and HMMSTR, a sequence-based tandem repeat copy number caller. HMMSTR outperforms current signal- and sequence-based callers relative to two assemblies and we show it performs with high accuracy in heterozygous regions and at low read coverage. The flexible panel allows us to capture disease associated regions at an average coverage of >150x. Using these tools, we successfully characterize known or suspected repeat expansions in patient derived samples. In these samples we also identify unexpected expanded alleles at tandem repeat loci not previously associated with the underlying diagnosis. This genotyping approach for tandem repeat expansions is scalable, simple, flexible, and accurate, offering significant potential for diagnostic applications and investigation of expansion co-occurrence in neurodegenerative disorders. Abstract Figure
Collapse
|
37
|
Jam HZ, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. Genome-wide profiling of genetic variation at tandem repeat from long reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576266. [PMID: 38328152 PMCID: PMC10849534 DOI: 10.1101/2024.01.20.576266] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
38
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
39
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
40
|
English A, Dolzhenko E, Jam HZ, Mckenzie S, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Benchmarking of small and large variants across tandem repeats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.29.564632. [PMID: 37961319 PMCID: PMC10634962 DOI: 10.1101/2023.10.29.564632] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Collapse
|
41
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
42
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
43
|
Wang X, Huang M, Budowle B, Ge J. TRcaller: a novel tool for precise and ultrafast tandem repeat variant genotyping in massively parallel sequencing reads. Front Genet 2023; 14:1227176. [PMID: 37533432 PMCID: PMC10390829 DOI: 10.3389/fgene.2023.1227176] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/13/2023] [Indexed: 08/04/2023] Open
Abstract
Calling tandem repeat (TR) variants from DNA sequences is of both theoretical and practical significance. Some bioinformatics tools have been developed for detecting or genotyping TRs. However, little study has been done to genotyping TR alleles from long-read sequencing data, and the accuracy of genotyping TR alleles from next-generation sequencing data still needs to be improved. Herein, a novel algorithm is described to retrieve TR regions from sequence alignment, and a software program TRcaller has been developed and integrated into a web portal to call TR alleles from both short- and long-read sequences, both whole genome and targeted sequences generated from multiple sequencing platforms. All TR alleles are genotyped as haplotypes and the robust alleles will be reported, even multiple alleles in a DNA mixture. TRcaller could provide substantially higher accuracy (>99% in 289 human individuals) in detecting TR alleles with magnitudes faster (e.g., ∼2 s for 300x human sequence data) than the mainstream software tools. The web portal preselected 119 TR loci from forensics, genealogy, and disease related TR loci. TRcaller is validated to be scalable in various applications, such as DNA forensics and disease diagnosis, which can be expanded into other fields like breeding programs. Availability: TRcaller is available at https://www.trcaller.com/SignIn.aspx.
Collapse
Affiliation(s)
- Xuewen Wang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Meng Huang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| |
Collapse
|
44
|
Wong J, Coombe L, Nikolić V, Zhang E, Nip KM, Sidhu P, Warren RL, Birol I. Linear time complexity de novo long read genome assembly with GoldRush. Nat Commun 2023; 14:2906. [PMID: 37217507 PMCID: PMC10202940 DOI: 10.1038/s41467-023-38716-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 05/11/2023] [Indexed: 05/24/2023] Open
Abstract
Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.
Collapse
Affiliation(s)
- Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Vladimir Nikolić
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Emily Zhang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Puneet Sidhu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanç Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
| |
Collapse
|
45
|
Kaplun L, Krautz-Peterson G, Neerman N, Stanley C, Hussey S, Folwick M, McGarry A, Weiss S, Kaplun A. ONT long-read WGS for variant discovery and orthogonal confirmation of short read WGS derived genetic variants in clinical genetic testing. Front Genet 2023; 14:1145285. [PMID: 37152986 PMCID: PMC10160624 DOI: 10.3389/fgene.2023.1145285] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Technological advances in Next-Generation Sequencing dramatically increased clinical efficiency of genetic testing, allowing detection of a wide variety of variants, from single nucleotide events to large structural aberrations. Whole Genome Sequencing (WGS) has allowed exploration of areas of the genome that might not have been targeted by other approaches, such as intergenic regions. A single technique detecting all genetic variants at once is intended to expedite the diagnostic process while making it more comprehensive and efficient. Nevertheless, there are still several shortcomings that cannot be effectively addressed by short read sequencing, such as determination of the precise size of short tandem repeat (STR) expansions, phasing of potentially compound recessive variants, resolution of some structural variants and exact determination of their boundaries, etc. Therefore, in some cases variants can only be tentatively detected by short reads sequencing and require orthogonal confirmation, particularly for clinical reporting purposes. Moreover, certain regulatory authorities, for example, New York state CLIA, require orthogonal confirmation of every reportable variant. Such orthogonal confirmations often involve numerous different techniques, not necessarily available in the same laboratory and not always performed in an expedited manner, thus negating the advantages of "one-technique-for-all" approach, and making the process lengthy, prone to logistical and analytical faults, and financially inefficient. Fortunately, those weak spots of short read sequencing can be compensated by long read technology that have comparable or better detection of some types of variants while lacking the mentioned above limitations of short read sequencing. At Variantyx we have developed an integrated clinical genetic testing approach, augmenting short read WGS-based variant detection with Oxford Nanopore Technologies (ONT) long read sequencing, providing simultaneous orthogonal confirmation of all types of variants with the additional benefit of improved identification of exact size and position of the detected aberrations. The validation study of this augmented test has demonstrated that Oxford Nanopore Technologies sequencing can efficiently verify multiple types of reportable variants, thus ensuring highly reliable detection and a quick turnaround time for WGS-based clinical genetic testing.
Collapse
|
46
|
Taylor A, Barros D, Gobet N, Schuepbach T, McAllister B, Aeschbach L, Randall E, Trofimenko E, Heuchan E, Barszcz P, Ciosi M, Morgan J, Hafford-Tear N, Davidson A, Massey T, Monckton D, Jones L, network REGISTRYH, Xenarios I, Dion V. Repeat Detector: versatile sizing of expanded tandem repeats and identification of interrupted alleles from targeted DNA sequencing. NAR Genom Bioinform 2022; 4:lqac089. [PMID: 36478959 PMCID: PMC9719798 DOI: 10.1093/nargab/lqac089] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 10/25/2022] [Accepted: 11/08/2022] [Indexed: 12/07/2022] Open
Abstract
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington's disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntington's disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies.
Collapse
Affiliation(s)
- Alysha S Taylor
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Dinis Barros
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Nastassia Gobet
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Thierry Schuepbach
- Vital-IT Group, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Newbiologix, Ch. De la corniche 6-8, 1066 Epalinges, Switzerland
| | - Branduff McAllister
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lorene Aeschbach
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Emma L Randall
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Evgeniya Trofimenko
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
- Sorbonne Université, École normale supérieure, PSL University, CNRS, Laboratoire des biomolécules, LBM, 75005 Paris, France
| | - Eleanor R Heuchan
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Paula Barszcz
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Marc Ciosi
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, Davidson Building, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Joanne Morgan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | | | - Alice E Davidson
- UCL Institute of Ophthalmology, 11-43 Bath Street, London, EC1V 9EL UK
| | - Thomas H Massey
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | - Darren G Monckton
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, Davidson Building, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Lesley Jones
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | | | - Ioannis Xenarios
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
- Health2030 Genome Center, Ch des Mines 14, 1202 Genève, Switzerland
| | - Vincent Dion
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| |
Collapse
|
47
|
Wang X, Budowle B, Ge J. USAT: a bioinformatic toolkit to facilitate interpretation and comparative visualization of tandem repeat sequences. BMC Bioinformatics 2022; 23:497. [PMID: 36402991 PMCID: PMC9675219 DOI: 10.1186/s12859-022-05021-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 10/29/2022] [Indexed: 11/21/2022] Open
Abstract
Background Tandem repeats (TR), highly variable genomic variants, are widely used in individual identification, disease diagnostics, and evolutionary studies. The recent advances in sequencing technologies and bioinformatic tools facilitate calling TR haplotypes genome widely. Both length-based and sequence-based TR alleles are used in different applications. However, sequence-based TR alleles could provide the highest precision in characterizing TR haplotypes. The need to identify the differences at the single nucleotide level between or among TR haplotypes with an easy-use bioinformatic tool is essential. Results In this study, we developed a Universal STR Allele Toolkit (USAT) for TR haplotype analysis, which takes TR haplotype output from existing tools to perform allele size conversion, sequence comparison of haplotypes, figure plotting, comparison for allele distribution, and interactive visualization. An exemplary application of USAT for analysis of the CODIS core STR loci for DNA forensics with benchmarking human individuals demonstrated the capabilities of USAT. USAT has user-friendly graphic interfaces and runs fast in major computing operating systems with parallel computing enabled. Conclusion USAT is a user-friendly bioinformatics software for interpretation, visualization, and comparisons of TRs. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-05021-1.
Collapse
Affiliation(s)
- Xuewen Wang
- grid.266869.50000 0001 1008 957XCenter for Human Identification, Health Science Center, University of North Texas, Fort Worth, TX USA
| | - Bruce Budowle
- grid.266869.50000 0001 1008 957XCenter for Human Identification, Health Science Center, University of North Texas, Fort Worth, TX USA ,grid.266871.c0000 0000 9765 6057Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX USA
| | - Jianye Ge
- grid.266869.50000 0001 1008 957XCenter for Human Identification, Health Science Center, University of North Texas, Fort Worth, TX USA ,grid.266871.c0000 0000 9765 6057Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX USA
| |
Collapse
|
48
|
Masnovo C, Lobo AF, Mirkin SM. Replication dependent and independent mechanisms of GAA repeat instability. DNA Repair (Amst) 2022; 118:103385. [PMID: 35952488 PMCID: PMC9675320 DOI: 10.1016/j.dnarep.2022.103385] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 07/28/2022] [Accepted: 07/30/2022] [Indexed: 11/20/2022]
Abstract
Trinucleotide repeat instability is a driver of human disease. Large expansions of (GAA)n repeats in the first intron of the FXN gene are the cause Friedreich's ataxia (FRDA), a progressive degenerative disorder which cannot yet be prevented or treated. (GAA)n repeat instability arises during both replication-dependent processes, such as cell division and intergenerational transmission, as well as in terminally differentiated somatic tissues. Here, we provide a brief historical overview on the discovery of (GAA)n repeat expansions and their association to FRDA, followed by recent advances in the identification of triplex H-DNA formation and replication fork stalling. The main body of this review focuses on the last decade of progress in understanding the mechanism of (GAA)n repeat instability during DNA replication and/or DNA repair. We propose that the discovery of additional mechanisms of (GAA)n repeat instability can be achieved via both comparative approaches to other repeat expansion diseases and genome-wide association studies. Finally, we discuss the advances towards FRDA prevention or amelioration that specifically target (GAA)n repeat expansions.
Collapse
Affiliation(s)
- Chiara Masnovo
- Department of Biology, Tufts University, Medford, MA 02155, USA
| | - Ayesha F Lobo
- Department of Biology, Tufts University, Medford, MA 02155, USA
| | - Sergei M Mirkin
- Department of Biology, Tufts University, Medford, MA 02155, USA.
| |
Collapse
|
49
|
Chiu R, Rajan-Babu IS, Birol I, Friedman JM. Linked-read sequencing for detecting short tandem repeat expansions. Sci Rep 2022; 12:9352. [PMID: 35672336 PMCID: PMC9174224 DOI: 10.1038/s41598-022-13024-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/19/2022] [Indexed: 11/09/2022] Open
Abstract
Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| |
Collapse
|
50
|
Walker K, Kalra D, Lowdon R, Chen G, Molik D, Soto DC, Dabbaghie F, Khleifat AA, Mahmoud M, Paulin LF, Raza MS, Pfeifer SP, Agustinho DP, Aliyev E, Avdeyev P, Barrozo ER, Behera S, Billingsley K, Chong LC, Choubey D, De Coster W, Fu Y, Gener AR, Hefferon T, Henke DM, Höps W, Illarionova A, Jochum MD, Jose M, Kesharwani RK, Kolora SRR, Kubica J, Lakra P, Lattimer D, Liew CS, Lo BW, Lo C, Lötter A, Majidian S, Mendem SK, Mondal R, Ohmiya H, Parvin N, Peralta C, Poon CL, Prabhakaran R, Saitou M, Sammi A, Sanio P, Sapoval N, Syed N, Treangen T, Wang G, Xu T, Yang J, Zhang S, Zhou W, Sedlazeck FJ, Busby B. The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms. F1000Res 2022; 11:530. [PMID: 36262335 PMCID: PMC9557141 DOI: 10.12688/f1000research.110194.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2022] [Indexed: 01/25/2023] Open
Abstract
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.
Collapse
Affiliation(s)
- Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | - Guangyi Chen
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - David Molik
- Tropical Crop and Commodity Protection Research Unit, Pacific Basin Agricultural Research Center, Hilo, HI, 96720, USA
| | - Daniela C. Soto
- Biochemistry & Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, Davis, CA, 95616, USA
| | - Fawaz Dabbaghie
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany
- Institute for Medical Biometry and Bioinformatics, University hospital Düsseldorf, Düsseldorf, Germany
| | - Ahmad Al Khleifat
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Muhammad Sohail Raza
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Beijing, China
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, 63110, USA
| | - Elbay Aliyev
- Research Department, Sidra Medicine, Doha, Qatar
| | - Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, 20052, USA
| | - Enrico R. Barrozo
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Li Chuin Chong
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| | - Deepak Choubey
- Department of Technology, Savitribai Phule Pune University, Pune, Maharashtra, India
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Alejandro R. Gener
- Association of Public Health Labs, Centers for Disease Control and Prevention, Downey, CA, USA
| | - Timothy Hefferon
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Morgan Henke
- Department Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wolfram Höps
- EMBL Heidelberg, Genome Biology Unit, Heidelberg, Germany
| | | | - Michael D. Jochum
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Maria Jose
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | - Rupesh K. Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | | | - Priya Lakra
- Department of Zoology, University of Delhi, Delhi, India
| | - Damaris Lattimer
- University of Applied Sciences Upper Austria - FH Hagenberg, Mühlkreis, Austria
| | - Chia-Sin Liew
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588, USA
| | - Bai-Wei Lo
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Chunhsuan Lo
- Human Genetics Laboratory, National Institute of Genetics, Japan, Mishima City, Japan
| | - Anneri Lötter
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | | | - Rajarshi Mondal
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | - Hiroko Ohmiya
- Genetic Reagent Development Unit, Medical & Biological Laboratories Co., Ltd., Tokoyo, Japan
| | - Nasrin Parvin
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | | | | | | | - Marie Saitou
- Center of Integrative Genetics (CIGENE),Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Aditi Sammi
- School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Najeeb Syed
- Research Department, Sidra Medicine, Doha, Qatar
| | - Todd Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Tiancheng Xu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology,, University of Southern California, Los Angeles, CA, USA
| | - Shangzhe Zhang
- School of Biology, University of St Andrews, St Andrews, UK
| | - Weiyu Zhou
- Department of Statistical Science, George Mason University, Fairfax, Virginia, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | |
Collapse
|