1
|
Liu Q, Tian W. Association of human-specific expanded short tandem repeats with neuron-specific regulatory features. SCIENCE ADVANCES 2025; 11:eadp9707. [PMID: 40446031 PMCID: PMC12124357 DOI: 10.1126/sciadv.adp9707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 04/24/2025] [Indexed: 06/02/2025]
Abstract
Short tandem repeats (STRs), characterized by high-copy number mutations, represent one of the fastest-evolving genomic elements. However, human-specific expanded STRs (heSTRs) have lacked comprehensive genome-wide characterization. Leveraging 148 human and 26 nonhuman primate haploid genomes, we identified 8813 heSTRs with robust expansions in copy number distributions. Our analysis revealed notable associations between heSTRs and brain- and neuron-specific distal regulatory signals. Potential target genes regulated by heSTRs, identified by incorporating distal regulations, are enriched with neuronal development-related functions and disorders, displaying neuron-specific expression enhancement in humans. Moreover, heSTRs are associated with enhanced chromatin accessibility specifically in human neurons. In addition, heSTRs show substantial association with pathogenic STR loci exhibiting abnormal copy number variations, as reported by cohort studies on schizophrenia and autism. This study underscores the role of heSTRs in both human evolution and disorders, offering valuable insights for future research on STRs from an evolutionary perspective.
Collapse
Affiliation(s)
- Qiming Liu
- State Key Laboratory of Genetics and Development of Complex Phenotypes, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Weidong Tian
- State Key Laboratory of Genetics and Development of Complex Phenotypes, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
- Children’s Hospital of Fudan University, Shanghai, China
- Children’s Hospital of Shandong University, Jinan, China
| |
Collapse
|
2
|
Spargo TP, Iacoangeli A, Ryten M, Forzano F, Pearce N, Al-Chalabi A. Modelling Population Genetic Screening in Rare Neurodegenerative Diseases. Biomedicines 2025; 13:1018. [PMID: 40426848 PMCID: PMC12108917 DOI: 10.3390/biomedicines13051018] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2025] [Revised: 04/16/2025] [Accepted: 04/17/2025] [Indexed: 05/29/2025] Open
Abstract
Importance: Genomic sequencing enables the rapid identification of a breadth of genetic variants. For clinical purposes, sequencing for small genetic variations is considered a solved problem, while challenges remain for structural variants, given the lower sensitivity and specificity. Interest has recently risen among governing bodies in developing protocols for population-wide genetic screening. However, usefulness is constrained when the probability of being affected by a rare disease remains low, despite a positive genetic test. This is a common scenario in neurodegenerative disorders. The problem is recognised among statisticians and statistical geneticists but is less well-understood by clinicians and researchers who will act on these results, and by the general public who might access screening services directly without the appropriate support for interpretation. Observations: We explore the probability of subsequent disease following genetic screening of several variants, both single nucleotide variants (SNVs) and larger repeat expansions, for two neurological conditions, Huntington's disease (HD) and amyotrophic lateral sclerosis (ALS), comparing these results with screening for phenylketonuria, which is well-established. The risk following a positive screening test was 0.5% for C9orf72 in ALS and 0.4% for HTT in HD when testing repeat expansions, for which the test had sub-optimal performance (sensitivity = 99% and specificity = 90%), and 12.7% for phenylketonuria and 10.9% for ALS SOD1 when testing pathogenic SNVs (sensitivity = 99.96% and specificity = 99.95%). Subsequent screening confirmation via PCR for C9orf72 led to a 2% risk of developing ALS as a result of the reduced penetrance (44%). Conclusions and Relevance: We show that risk following a positive screening test result can be strikingly low for rare neurological diseases, even for fully penetrant variants such as HTT, if the test has sub-optimal performance. Accordingly, to maximise the utility of screening, it is vital to prioritise protocols with very high sensitivity and specificity, and a careful selection of markers for screening, giving regard to clinical interpretability, actionability, high penetrance, and secondary testing to confirm positive findings.
Collapse
Affiliation(s)
- Thomas P. Spargo
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King’s College London, London WC2R 2LS, UK
| | - Alfredo Iacoangeli
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King’s College London, London WC2R 2LS, UK
- Department of Biostatistics and Health Informatics, King’s College London, London WC2R 2LS, UK
- NIHR Maudsley Biomedical Research Centre (BRC) at South London and Maudsley NHS Foundation Trust, King’s College London, London WC2R 2LS, UK
| | - Mina Ryten
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London WC1E 6BT, UK
- Biomedical Research Centre, NIHR Great Ormond Street Hospital, University College London, London WC1E 6BT, UK
- Department of Clinical Genetics, Great Ormond Street Hospital, London WC1N 3JH, UK
| | - Francesca Forzano
- Department of Clinical Genetics, Guy’s and St Thomas NHS Foundation Trust, London SE1 7EH, UK
| | - Neil Pearce
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King’s College London, London WC2R 2LS, UK
- King’s College Hospital, Bessemer Road, London SE5 9RS, UK
| |
Collapse
|
3
|
van der Sanden B, Neveling K, Shukor S, Gallagher MD, Lee J, Burke SL, Pennings M, van Beek R, Oorsprong M, Kater-Baats E, Kamping E, Tieleman AA, Voermans NC, Scheffer IE, Gecz J, Corbett MA, Vissers LELM, Pang AWC, Hastie A, Kamsteeg EJ, Hoischen A. Optical genome mapping enables accurate testing of large repeat expansions. Genome Res 2025; 35:810-823. [PMID: 40113266 PMCID: PMC12047237 DOI: 10.1101/gr.279491.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 02/24/2025] [Indexed: 03/22/2025]
Abstract
Short tandem repeats (STRs) are common variations in human genomes that frequently expand or contract, causing genetic disorders, mainly when expanded. Traditional diagnostic methods for identifying these expansions, such as repeat-primed PCR and Southern blotting, are often labor-intensive, locus-specific, and are unable to precisely determine long repeat expansions. Sequencing-based methods, although capable of genome-wide detection, are limited by inaccuracy (short-read technologies) and high associated costs (long-read technologies). This study evaluated optical genome mapping (OGM) as an efficient, accurate approach for measuring STR lengths and assessing somatic stability in 85 samples with known pathogenic repeat expansions in DMPK, CNBP, and RFC1, causing myotonic dystrophy types 1 and 2 and cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS), respectively. Three workflows-manual de novo assembly, local guided assembly (local-GA), and a molecule distance script-were applied, of which the latter two were developed as part of this study to assess the repeat sizes and somatic repeat stability. OGM successfully identified 84/85 (98.8%) of the pathogenic expansions, distinguishing between wild-type and expanded alleles or between two expanded alleles in recessive cases, with greater accuracy than standard of care (SOC) for long repeats and no apparent upper size limit. Notably, OGM detected somatic instability in a subset of DMPK, CNBP, and RFC1 samples. These findings suggest OGM could advance diagnostic accuracy for large repeat expansions, providing a more comprehensive genome-wide assay for repeat expansion disorders by measuring exact repeat lengths and somatic instability across multiple loci simultaneously.
Collapse
Affiliation(s)
- Bart van der Sanden
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Kornelia Neveling
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Syukri Shukor
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Michael D Gallagher
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Joyce Lee
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Stephanie L Burke
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Maartje Pennings
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ronald van Beek
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Michiel Oorsprong
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ellen Kater-Baats
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Eveline Kamping
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Alide A Tieleman
- Department of Neurology, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Nicol C Voermans
- Department of Neurology, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, VIC 3084, Australia
- Department of Pediatrics, University of Melbourne, Royal Children's Hospital, Florey and Murdoch Children's Research Institutes, VIC 3052, Melbourne, Australia
| | - Jozef Gecz
- South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
- Robinson Research Institute and Adelaide Medical School, University of Adelaide, Adelaide, SA 5000, Australia
| | - Mark A Corbett
- Robinson Research Institute and Adelaide Medical School, University of Adelaide, Adelaide, SA 5000, Australia
| | - Lisenka E L M Vissers
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Andy Wing Chun Pang
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Alex Hastie
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands;
| | - Alexander Hoischen
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands;
- Department of Internal Medicine, Radboud Expertise Center for Immunodeficiency and Autoinflammation and Radboud Center for Infectious Disease (RCI), Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| |
Collapse
|
4
|
Xu IRL, Danzi MC, Raposo J, Züchner S. The continued promise of genomic technologies and software in neurogenetics. J Neuromuscul Dis 2025:22143602251325345. [PMID: 40208247 DOI: 10.1177/22143602251325345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
The continued evolution of genomic technologies over the past few decades has revolutionized the field of neurogenetics, offering profound insights into the genetic underpinnings of neurological disorders. Identification of causal genes for numerous monogenic neurological conditions has informed key aspects of disease mechanisms and facilitated research into critical proteins and molecular pathways, laying the groundwork for therapeutic interventions. However, the question remains: has this transformative trend reached its zenith? In this review, we suggest that despite significant strides in genome sequencing and advanced computational analyses, there is still ample room for methodological refinement. We anticipate further major genetic breakthroughs corresponding with the increased use of long-read genomes, variant calling software, AI tools, and data aggregation databases. Genetic progress has historically been driven by technological advancements from the commercial sector, which are developed in response to academic research needs, creating a continuous cycle of innovation and discovery. This review explores the potential of genomic technologies to address the challenges of neurogenetic disorders. By outlining both established and modern resources, we aim to emphasize the importance of genetic technologies as we enter an era poised for discoveries.
Collapse
Affiliation(s)
- Isaac R L Xu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jacquelyn Raposo
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
5
|
Adam CL, Rocha J, Sudmant P, Rohlfs R. TRACKing tandem repeats: a customizable pipeline for identification and cross-species comparison. BIOINFORMATICS ADVANCES 2025; 5:vbaf066. [PMID: 40351869 PMCID: PMC12064168 DOI: 10.1093/bioadv/vbaf066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 03/14/2025] [Accepted: 04/07/2025] [Indexed: 05/14/2025]
Abstract
Summary TRACK is a user-friendly Snakemake workflow designed to streamline the discovery and comparison of tandem repeats (TRs) across species. TRACK facilitates the cataloging and filtering of TRs based on reference genomes or T2T transcripts, and applies reciprocal LiftOver and sequence alignment methods to identify putative homologous TRs between species. For further analyses, TRACK can be used to genotype TRs and subsequently estimate and plot basic population genetic statistics. By incorporating key functionalities within an integrated workflow, TRACK enhances TR analysis accessibility and reproducibility, while offering flexibility for the user. Availability and implementation The TRACK toolkit with step-by-step tutorial is freely available at https://github.com/caroladam/track.
Collapse
Affiliation(s)
- Carolina L Adam
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, United States
| | - Joana Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Peter Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Rori Rohlfs
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, United States
- School of Computer and Data Sciences, University of Oregon, Eugene, OR 97403, United States
| |
Collapse
|
6
|
Liu Y, Xia K. Aberrant Short Tandem Repeats: Pathogenicity, Mechanisms, Detection, and Roles in Neuropsychiatric Disorders. Genes (Basel) 2025; 16:406. [PMID: 40282366 PMCID: PMC12026680 DOI: 10.3390/genes16040406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/17/2025] [Accepted: 03/19/2025] [Indexed: 04/29/2025] Open
Abstract
Short tandem repeat (STR) sequences are highly variable DNA segments that significantly contribute to human neurodegenerative disorders, highlighting their crucial role in neuropsychiatric conditions. This article examines the pathogenicity of abnormal STRs and classifies tandem repeat expansion disorders(TREDs), emphasizing their genetic characteristics, mechanisms of action, detection methods, and associated animal models. STR expansions exhibit complex genetic patterns that affect the age of onset and symptom severity. These expansions disrupt gene function through mechanisms such as gene silencing, toxic gain-of-function mutations leading to RNA and protein toxicity, and the generation of toxic peptides via repeat-associated non-AUG (RAN) translation. Advances in sequencing technologies-from traditional PCR and Southern blotting to next-generation and long-read sequencing-have enhanced the accuracy of STR variation detection. Research utilizing these technologies has linked STR expansions to a range of neuropsychiatric disorders, including autism spectrum disorders and schizophrenia, highlighting their contribution to disease risk and phenotypic expression through effects on genes involved in neurodevelopment, synaptic function, and neuronal signaling. Therefore, further investigation is essential to elucidate the intricate interplay between STRs and neuropsychiatric diseases, paving the way for improved diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Yuzhong Liu
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| | - Kun Xia
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| |
Collapse
|
7
|
Yoon JG, Lee S, Park S, Jang SS, Cho J, Kim MJ, Kim SY, Kim WJ, Lee JS, Chae JH. Identification of a novel non-coding deletion in Allan-Herndon-Dudley syndrome by long-read HiFi genome sequencing. BMC Med Genomics 2025; 18:41. [PMID: 40033291 PMCID: PMC11877835 DOI: 10.1186/s12920-024-02058-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 11/27/2024] [Indexed: 03/05/2025] Open
Abstract
BACKGROUND Allan-Herndon-Dudley syndrome (AHDS) is an X-linked disorder caused by pathogenic variants in the SLC16A2 gene. Although most reported variants are found in protein-coding regions or adjacent junctions, structural variations (SVs) within non-coding regions have not been previously reported. METHODS We investigated two male siblings with severe neurodevelopmental disorders and spasticity, who had remained undiagnosed for over a decade and were negative from exome sequencing, utilizing long-read HiFi genome sequencing. We conducted a comprehensive analysis including short-tandem repeats (STRs) and SVs to identify the genetic cause in this familial case. RESULTS While coding variant and STR analyses yielded negative results, SV analysis revealed a novel hemizygous deletion in intron 1 of the SLC16A2 gene (chrX:74,460,691 - 74,463,566; 2,876 bp), inherited from their carrier mother and shared by the siblings. Determination of the breakpoints indicates that the deletion probably resulted from Alu/Alu-mediated rearrangements between homologous AluY pairs. The deleted region is predicted to include multiple transcription factor binding sites, such as Stat2, Zic1, Zic2, and FOXD3, which are crucial for the neurodevelopmental process, as well as a regulatory element including an eQTL (rs1263181) that is implicated in the tissue-specific regulation of SLC16A2 expression, notably in skeletal muscle and thyroid tissues. CONCLUSIONS This report, to our knowledge, is the first to describe a non-coding deletion associated with AHDS, demonstrating the potential utility of long-read sequencing for undiagnosed patients. Although interpreting variants in non-coding regions remains challenging, our study highlights this region as a high priority for future investigation and functional studies.
Collapse
Affiliation(s)
- Jihoon G Yoon
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Laboratory Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Seungbok Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Soojin Park
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Se Song Jang
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jaeso Cho
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Man Jin Kim
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Soo Yeon Kim
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Woo Joong Kim
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jin Sook Lee
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jong-Hee Chae
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
- Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
8
|
Jeanjean S, Shen Y, Hardy L, Daunay A, Delépine M, Gerber Z, Alberdi A, Tubacher E, Deleuze JF, How-Kit A. A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers. Nucleic Acids Res 2025; 53:gkaf131. [PMID: 40036507 PMCID: PMC11878640 DOI: 10.1093/nar/gkaf131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 01/13/2025] [Accepted: 02/11/2025] [Indexed: 03/06/2025] Open
Abstract
Microsatellites are short tandem repeats (STRs) of a motif of 1-6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, remain very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. Here, we assessed several second and third-generation sequencing approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard polymerase chain reaction (PCR)-free and PCR-containing, single Unique Molecular Indentifier (UMI) and dual UMI 'duplex sequencing' protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and Oxford Nanopore Technologies long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Collapse
Affiliation(s)
- Sophie I Jeanjean
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Yimin Shen
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Lise M Hardy
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Antoine Daunay
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Marc Delépine
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Zuzana Gerber
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Antonio Alberdi
- Technological Platform of Saint-Louis Research Institute (IRSL), Saint-Louis Hospital, University of Paris, 75010 Paris, France
| | - Emmanuel Tubacher
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Jean-François Deleuze
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Alexandre How-Kit
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| |
Collapse
|
9
|
Hobara T, Ando M, Higuchi Y, Yuan JH, Yoshimura A, Kojima F, Noguchi Y, Takei J, Hiramatsu Y, Nozuma S, Nakamura T, Adachi T, Toyooka K, Yamashita T, Sakiyama Y, Hashiguchi A, Matsuura E, Okamoto Y, Takashima H. Linking LRP12 CGG repeat expansion to inherited peripheral neuropathy. J Neurol Neurosurg Psychiatry 2025; 96:140-149. [PMID: 39013564 PMCID: PMC11877035 DOI: 10.1136/jnnp-2024-333403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 06/12/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND The causative genes for over 60% of inherited peripheral neuropathy (IPN) remain unidentified. This study endeavours to enhance the genetic diagnostic rate in IPN cases by conducting screenings focused on non-coding repeat expansions. METHODS We gathered data from 2424 unrelated Japanese patients diagnosed with IPN, among whom 1555 cases with unidentified genetic causes, as determined through comprehensive prescreening analyses, were selected for the study. Screening for CGG non-coding repeat expansions in LRP12, GIPC1 and RILPL1 genes was conducted using PCR and long-read sequencing technologies. RESULTS We identified CGG repeat expansions in LRP12 from 44 cases, establishing it as the fourth most common aetiology in Japanese IPN. Most cases (29/37) exhibited distal limb weakness, without ptosis, ophthalmoplegia, facial muscle weakness or bulbar palsy. Neurogenic changes were frequently observed in both needle electromyography (97%) and skeletal muscle tissue (100%). In nerve conduction studies, 28 cases primarily showed impairment in motor nerves without concurrent involvement of sensory nerves, consistent with the phenotype of hereditary motor neuropathy. In seven cases, both motor and sensory nerves were affected, resembling the Charcot-Marie-Tooth (CMT) phenotype. Importantly, the mean CGG repeat number detected in the present patients was significantly shorter than that of patients with LRP12-oculopharyngodistal myopathy (p<0.0001). Additionally, GIPC1 and RILPL1 repeat expansions were absent in our IPN cases. CONCLUSION We initially elucidate LRP12 repeat expansions as a prevalent cause of CMT, highlighting the necessity for an adapted screening strategy in clinical practice, particularly when addressing patients with IPN.
Collapse
Affiliation(s)
- Takahiro Hobara
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Masahiro Ando
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yujiro Higuchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Jun-Hui Yuan
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Akiko Yoshimura
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Fumikazu Kojima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yutaka Noguchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Jun Takei
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yu Hiramatsu
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Satoshi Nozuma
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Tomonori Nakamura
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Tadashi Adachi
- Division of Neuropathology, Department of Brain and Neurosciences, Tottori University Faculty of Medicine, Tottori, Japan
| | - Keiko Toyooka
- Department of Neurology, National Hospital Organization Osaka Toneyama Medical Center, Osaka, Japan
| | - Toru Yamashita
- Department of Neurology, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Yusuke Sakiyama
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Akihiro Hashiguchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Eiji Matsuura
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yuji Okamoto
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
- Department of Physical Therapy, Kagoshima University Faculty of Medicine School of Health Sciences, Kagoshima, Japan
| | - Hiroshi Takashima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| |
Collapse
|
10
|
Van Deynze K, Mumm C, Maltby CJ, Switzenberg JA, Todd P, Boyle AP. Enhanced detection and genotyping of disease-associated tandem repeats using HMMSTR and targeted long-read sequencing. Nucleic Acids Res 2025; 53:gkae1202. [PMID: 39676678 PMCID: PMC11754662 DOI: 10.1093/nar/gkae1202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 10/16/2024] [Accepted: 11/19/2024] [Indexed: 12/17/2024] Open
Abstract
Tandem repeat sequences comprise approximately 8% of the human genome and are linked to more than 50 neurodegenerative disorders. Accurate characterization of disease-associated repeat loci remains resource intensive and often lacks high resolution genotype calls. We introduce a multiplexed, targeted nanopore sequencing panel and HMMSTR, a sequence-based tandem repeat copy number caller which outperforms current signal- and sequence-based callers relative to two assemblies and we show it performs with high accuracy in heterozygous regions and at low read coverage. The flexible panel allows us to capture disease associated regions at an average coverage of >150x. Using these tools, we successfully characterize known or suspected repeat expansions in patient derived samples. In these samples, we also identify unexpected expanded alleles at tandem repeat loci not previously associated with the underlying diagnosis. This genotyping approach for tandem repeat expansions is scalable, simple, flexible and accurate, offering significant potential for diagnostic applications and investigation of expansion co-occurrence in neurodegenerative disorders.
Collapse
Affiliation(s)
- Kinsey Van Deynze
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Camille Mumm
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Connor J Maltby
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jessica A Switzenberg
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter K Todd
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA
- Ann Arbor Veterans Administration Healthcare, Ann Arbor, MI 48105, USA
| | - Alan P Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
11
|
Maestri S, Scalzo D, Damaggio G, Zobel M, Besusso D, Cattaneo E. Navigating triplet repeats sequencing: concepts, methodological challenges and perspective for Huntington's disease. Nucleic Acids Res 2025; 53:gkae1155. [PMID: 39676657 PMCID: PMC11724279 DOI: 10.1093/nar/gkae1155] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 10/16/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024] Open
Abstract
The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
Collapse
Affiliation(s)
- Simone Maestri
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Davide Scalzo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Gianluca Damaggio
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Martina Zobel
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Dario Besusso
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Elena Cattaneo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| |
Collapse
|
12
|
Park G, An H, Luo H, Park J. NanoMnT: an STR analysis tool for Oxford Nanopore sequencing data driven by a comprehensive analysis of error profile in STR regions. Gigascience 2025; 14:giaf013. [PMID: 40094553 PMCID: PMC11912559 DOI: 10.1093/gigascience/giaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 12/29/2024] [Accepted: 02/02/2025] [Indexed: 03/19/2025] Open
Abstract
Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.
Collapse
Affiliation(s)
- Gyumin Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| | - Hyunsu An
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| | - Han Luo
- Department of Thyroid and Parathyroid Surgery, Laboratory of thyroid and parathyroid disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 61005, China
| | - Jihwan Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| |
Collapse
|
13
|
Ferreira MR, Carratto TMT, Frontanilla TS, Bonadio RS, Jain M, de Oliveira SF, Castelli EC, Mendes-Junior CT. Advances in forensic genetics: Exploring the potential of long read sequencing. Forensic Sci Int Genet 2025; 74:103156. [PMID: 39427416 DOI: 10.1016/j.fsigen.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 10/04/2024] [Accepted: 10/06/2024] [Indexed: 10/22/2024]
Abstract
DNA-based technologies have been used in forensic practice since the mid-1980s. While PCR-based STR genotyping using Capillary Electrophoresis remains the gold standard for generating DNA profiles in routine casework worldwide, the research community is continually seeking alternative methods capable of providing additional information to enhance discrimination power or contribute with new investigative leads. Oxford Nanopore Technologies (ONT) and PacBio third-generation sequencing have revolutionized the field, offering real-time capabilities, single-molecule resolution, and long-read sequencing (LRS). ONT, the pioneer of nanopore sequencing, uses biological nanopores to analyze nucleic acids in real-time. Its devices have revolutionized sequencing and may represent an interesting alternative for forensic research and routine casework, given that it offers unparalleled flexibility in a portable size: it enables sequencing approaches that range widely from PCR-amplified short target regions (e.g., CODIS STRs) to PCR-free whole transcriptome or even ultra-long whole genome sequencing. Despite its higher error rate compared to Illumina sequencing, it can significantly improve accuracy in read alignment against a reference genome or de novo genome assembly. This is achieved by generating long contiguous sequences that correctly assemble repetitive sections and regions with structural variation. Moreover, it allows real-time determination of DNA methylation status from native DNA without the need for bisulfite conversion. LRS enables the analysis of thousands of markers at once, providing phasing information and eliminating the need for multiple assays. This maximizes the information retrieved from a single invaluable sample. In this review, we explore the potential use of LRS in different forensic genetics approaches.
Collapse
Affiliation(s)
- Marcel Rodrigues Ferreira
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Thássia Mayra Telles Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil
| | - Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14049-900, Brazil
| | - Raphael Severino Bonadio
- Depto Genética e Morfologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | | | - Erick C Castelli
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil; Pathology Department, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil.
| |
Collapse
|
14
|
Asano K, Yoshimi K, Takeshita K, Mitsuhashi S, Kochi Y, Hirano R, Tingyu Z, Ishida S, Mashimo T. CRISPR Diagnostics for Quantification and Rapid Diagnosis of Myotonic Dystrophy Type 1 Repeat Expansion Disorders. ACS Synth Biol 2024; 13:3926-3935. [PMID: 39565688 DOI: 10.1021/acssynbio.4c00265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2024]
Abstract
Repeat expansion disorders, exemplified by myotonic dystrophy type 1 (DM1), present challenges in diagnostic quantification because of the variability and complexity of repeat lengths. Traditional diagnostic methods, including PCR and Southern blotting, exhibit limitations in sensitivity and specificity, necessitating the development of innovative approaches for precise and rapid diagnosis. Here, we introduce a CRISPR-based diagnostic method, REPLICA (repeat-primed locating of inherited disease by Cas3), for the quantification and rapid diagnosis of DM1. This method, using in vitro-assembled CRISPR-Cas3, demonstrates superior sensitivity and specificity in quantifying CTG repeat expansion lengths, correlated with disease severity. We also validate the robustness and accuracy of CRISPR diagnostics in quantitatively diagnosing DM1 using patient genomes. Furthermore, we optimize a REPLICA-based assay for point-of-care-testing using lateral flow test strips, facilitating rapid screening and detection. In summary, REPLICA-based CRISPR diagnostics offer precise and rapid detection of repeat expansion disorders, promising personalized treatment strategies.
Collapse
Affiliation(s)
- Koji Asano
- Division of Animal Genetics, Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Kazuto Yoshimi
- Division of Animal Genetics, Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
- Division of Genome Engineering, Center for Experimental Medicine and Systems Biology, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan
| | - Kohei Takeshita
- Life Science Research Infrastructure Group, Advanced Photon Technology Division, RIKEN Spring-8 Center, Hyogo 679-5148, Japan
| | - Satomi Mitsuhashi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki 216-8511, Japan
| | - Yuta Kochi
- Department of Genomic Function and Diversity, Medical Research Laboratory, Institute of Integrated Research, Institute of Science Tokyo, Tokyo 113-8510, Japan
| | - Rika Hirano
- Division of Animal Genetics, Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Zong Tingyu
- Division of Animal Genetics, Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Saeko Ishida
- Division of Animal Genetics, Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Tomoji Mashimo
- Division of Animal Genetics, Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
- Division of Genome Engineering, Center for Experimental Medicine and Systems Biology, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan
| |
Collapse
|
15
|
Zhang Y, Liu X, Li Z, Li H, Miao Z, Wan B, Xu X. Advances on the Mechanisms and Therapeutic Strategies in Non-coding CGG Repeat Expansion Diseases. Mol Neurobiol 2024; 61:10722-10735. [PMID: 38780719 DOI: 10.1007/s12035-024-04239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/02/2024] [Indexed: 05/25/2024]
Abstract
Non-coding CGG repeat expansions within the 5' untranslated region are implicated in a range of neurological disorders, including fragile X-associated tremor/ataxia syndrome, oculopharyngeal myopathy with leukodystrophy, and oculopharyngodistal myopathy. This review outlined the general characteristics of diseases associated with non-coding CGG repeat expansions, detailing their clinical manifestations and neuroimaging patterns, which often overlap and indicate shared pathophysiological traits. We summarized the underlying molecular mechanisms of these disorders, providing new insights into the roles that DNA, RNA, and toxic proteins play. Understanding these mechanisms is crucial for the development of targeted therapeutic strategies. These strategies include a range of approaches, such as antisense oligonucleotides, RNA interference, genomic DNA editing, small molecule interventions, and other treatments aimed at correcting the dysregulated processes inherent in these disorders. A deeper understanding of the shared mechanisms among non-coding CGG repeat expansion disorders may hold the potential to catalyze the development of innovative therapies, ultimately offering relief to individuals grappling with these debilitating neurological conditions.
Collapse
Affiliation(s)
- Yutong Zhang
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
| | - Xuan Liu
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
| | - Zeheng Li
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
| | - Hao Li
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
- Department of Neurology, The Fourth Affiliated Hospital of Soochow University, Suzhou, 215124, China
| | - Zhigang Miao
- The Institute of Neuroscience, Soochow University, Suzhou City, China
| | - Bo Wan
- The Institute of Neuroscience, Soochow University, Suzhou City, China
| | - Xingshun Xu
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China.
- The Institute of Neuroscience, Soochow University, Suzhou City, China.
- Department of Neurology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China.
| |
Collapse
|
16
|
Song Z, Zahin T, Li X, Shao M. Accurate Detection of Tandem Repeats from Error-Prone Sequences with EquiRep. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.05.621953. [PMID: 39574759 PMCID: PMC11580891 DOI: 10.1101/2024.11.05.621953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Abstract
A tandem repeat is a sequence of nucleotides that occurs as multiple contiguous and near-identical copies positioned next to each other. These repeats play critical roles in genetic diversity, gene regulation, and are strongly linked to various neurological and developmental disorders. While several methods exist for detecting tandem repeats, they often exhibit low accuracy when the repeat unit length increases or the number of copies is low. Furthermore, methods capable of handling highly mutated sequences remain scarce, highlighting a significant opportunity for improvement. We introduce EquiRep, a tool for accurate detection of tandem repeats from erroneous sequences. EquiRep estimates the likelihood of positions originating from the same position in the unit by self-alignment followed by a novel approach that refines the estimation. The built equivalent classes and the consecutive position information will be then used to build a weighted graph, and the cycle in this graph with maximum bottleneck weight while covering most nucleotide positions will be identified to reconstruct the repeat unit. We test EquiRep on simulated and real HOR and RCA datasets where it consistently outperforms or is comparable to state-of-the-art methods. EquiRep is robust to sequencing errors, and is able to make better predictions for long units and low frequencies which underscores its broad usability for studying tandem repeats.
Collapse
Affiliation(s)
- Zhezheng Song
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Tasfia Zahin
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Xiang Li
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Mingfu Shao
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
17
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 PMCID: PMC11921810 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
18
|
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads. Genome Biol 2024; 25:176. [PMID: 38965568 PMCID: PMC11229021 DOI: 10.1186/s13059-024-03319-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 06/21/2024] [Indexed: 07/06/2024] Open
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
19
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
20
|
Rajan-Babu IS, Dolzhenko E, Eberle MA, Friedman JM. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 2024; 25:476-499. [PMID: 38467784 DOI: 10.1038/s41576-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/13/2024]
Abstract
Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities. Here, we review the diverse structural conformations of repeat expansions, technological advances for the characterization of changes in sequence composition, their clinical correlations and the impact on disease mechanisms.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada.
| | | | | | - Jan M Friedman
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada
- BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
21
|
Zhang M. STRAS:a snakemake pipeline for genome-wide short tandem repeats annotation and score. Hum Genet 2024; 143:735-738. [PMID: 38507015 DOI: 10.1007/s00439-024-02662-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/13/2024] [Indexed: 03/22/2024]
Abstract
High-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~ 800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate was 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available at https://github.com/fancheyu5/STRAS .
Collapse
Affiliation(s)
- Mengna Zhang
- Molecular Diagnosis Center, The Affiliated Hospital of Chengde Medical University, Chengde, 067000, China.
| |
Collapse
|
22
|
Alvarez Jerez P, Daida K, Miano-Burkhardt A, Iwaki H, Malik L, Cogan G, Makarious MB, Sullivan R, Vandrovcova J, Ding J, Gibbs JR, Markham A, Nalls MA, Kesharwani RK, Sedlazeck FJ, Casey B, Hardy J, Houlden H, Blauwendraat C, Singleton AB, Billingsley KJ. Profiling complex repeat expansions in RFC1 in Parkinson's disease. NPJ Parkinsons Dis 2024; 10:108. [PMID: 38789445 PMCID: PMC11126591 DOI: 10.1038/s41531-024-00723-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
A biallelic (AAGGG) expansion in the poly(A) tail of an AluSx3 transposable element within the gene RFC1 is a frequent cause of cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS), and more recently, has been reported as a rare cause of Parkinson's disease (PD) in the Finnish population. Here, we investigate the prevalence of RFC1 (AAGGG) expansions in PD patients of non-Finnish European ancestry in 1609 individuals from the Parkinson's Progression Markers Initiative study. We identified four PD patients carrying the biallelic RFC1 (AAGGG) expansion and did not identify any carriers in controls.
Collapse
Affiliation(s)
- Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Guillaume Cogan
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Sorbonne Université, Institut du Cerveau-Paris Brain Institute-ICM, Institut National de la Recherche Médicale-U1127, Centre National de la Recherche Scientifique, Paris, France
| | - Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- UCL Movement Disorders Centre, University College London, London, UK
| | - Roisin Sullivan
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Jana Vandrovcova
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - J Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | | | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson's Research, New York, NY, USA
| | - John Hardy
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Henry Houlden
- UCL Movement Disorders Centre, University College London, London, UK
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Kimberley J Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA.
| |
Collapse
|
23
|
Van Deynze K, Mumm C, Maltby CJ, Switzenberg JA, Todd PK, Boyle AP. Enhanced Detection and Genotyping of Disease-Associated Tandem Repeats Using HMMSTR and Targeted Long-Read Sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.01.24306681. [PMID: 38746091 PMCID: PMC11092683 DOI: 10.1101/2024.05.01.24306681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Tandem repeat sequences comprise approximately 8% of the human genome and are linked to more than 50 neurodegenerative disorders. Accurate characterization of disease-associated repeat loci remains resource intensive and often lacks high resolution genotype calls. We introduce a multiplexed, targeted nanopore sequencing panel and HMMSTR, a sequence-based tandem repeat copy number caller. HMMSTR outperforms current signal- and sequence-based callers relative to two assemblies and we show it performs with high accuracy in heterozygous regions and at low read coverage. The flexible panel allows us to capture disease associated regions at an average coverage of >150x. Using these tools, we successfully characterize known or suspected repeat expansions in patient derived samples. In these samples we also identify unexpected expanded alleles at tandem repeat loci not previously associated with the underlying diagnosis. This genotyping approach for tandem repeat expansions is scalable, simple, flexible, and accurate, offering significant potential for diagnostic applications and investigation of expansion co-occurrence in neurodegenerative disorders. Abstract Figure
Collapse
|
24
|
Su C, Chandradoss KR, Malachowski T, Boya R, Ryu HS, Brennand KJ, Phillips-Cremins JE. MASTR-seq: Multiplexed Analysis of Short Tandem Repeats with sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591790. [PMID: 38746155 PMCID: PMC11092654 DOI: 10.1101/2024.04.29.591790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
More than 60 human disorders have been linked to unstable expansion of short tandem repeat (STR) tracts. STR length and the extent of DNA methylation is linked to disease pathology and can be mosaic in a cell type-specific manner in several repeat expansion disorders. Mosaic phenomenon have been difficult to study to date due to technical bias intrinsic to repeat sequences and the need for multi-modal measurements at single-allele resolution. Nanopore long-read sequencing accurately measures STR length and DNA methylation in the same single molecule but is cost prohibitive for studies assessing a target locus across multiple experimental conditions or patient samples. Here, we describe MASTR-seq, M ultiplexed A nalysis of S hort T andem R epeats, for cost-effective, high-throughput, accurate, multi-modal measurements of DNA methylation and STR genotype at single-allele resolution. MASTR-seq couples long-read sequencing, Cas9-mediated target enrichment, and PCR-free multiplexed barcoding to achieve a >ten-fold increase in on-target read mapping for 8-12 pooled samples in a single MinION flow cell. We provide a detailed experimental protocol and computational tools and present evidence that MASTR-seq quantifies tract length and DNA methylation status for CGG and CAG STR loci in normal-length and mutation-length human cell lines. The MASTR-seq protocol takes approximately eight days for experiments and one additional day for data processing and analyses. Key points We provide a protocol for MASTR-seq: M ultiplexed A nalysis of S hort T andem R epeats using Cas9-mediated target enrichment and PCR-free, multiplexed nanopore sequencing. MASTR-seq achieves a >10-fold increase in on-target read proportion for highly repetitive, technically inaccessible regions of the genome relevant for human health and disease.MASTR-seq allows for high-throughput, efficient, accurate, and cost-effective measurement of STR length and DNA methylation in the same single allele for up to 8-12 samples in parallel in one Nanopore MinION flow cell.
Collapse
|
25
|
Tachikawa K, Shimizu T, Imai T, Ko R, Kawai Y, Omae Y, Tokunaga K, Frith MC, Yamano Y, Mitsuhashi S. Cost-Effective Cas9-Mediated Targeted Sequencing of Spinocerebellar Ataxia Repeat Expansions. J Mol Diagn 2024; 26:85-95. [PMID: 38008286 DOI: 10.1016/j.jmoldx.2023.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/28/2023] Open
Abstract
Hereditary repeat diseases are caused by an abnormal expansion of short tandem repeats in the genome. Among them, spinocerebellar ataxia (SCA) is a heterogeneous disease, and currently, 16 responsible repeats are known. Genetic diagnosis is obtained by analyzing the number of repeats through separate testing of each repeat. Although simultaneous detection of candidate repeats using current massively parallel sequencing technologies has been developed to avoid complicated multiple experiments, these methods are generally expensive. This study developed a cost-effective SCA repeat panel [Flongle SCA repeat panel sequencing (FLO-SCAp)] using Cas9-mediated targeted long-read sequencing and the smallest long-read sequencing apparatus, Flongle. This panel enabled the detection of repeat copy number changes, internal repeat sequences, and DNA methylation in seven patients with different repeat expansion diseases. The median (interquartile range) values of coverage and on-target rate were 39.5 (12 to 72) and 11.6% (7.5% to 16.5%), respectively. This approach was validated by comparing repeat copy number changes measured by FLO-SCAp and short-read whole-genome sequencing. A high correlation was observed between FLO-SCAp and short-read whole-genome sequencing when the repeat length was ≤250 bp (r = 0.98; P < 0.001). Thus, FLO-SCAp represents the most cost-effective method for conducting multiplex testing of repeats and can serve as the first-line diagnostic tool for SCA.
Collapse
Affiliation(s)
- Keiji Tachikawa
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Takahiro Shimizu
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Takeshi Imai
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Riyoko Ko
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Yosuke Omae
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan; Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan; Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yoshihisa Yamano
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan; Department of Rare Diseases Research, Institute of Medical Science, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Satomi Mitsuhashi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan.
| |
Collapse
|
26
|
Jam HZ, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. Genome-wide profiling of genetic variation at tandem repeat from long reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576266. [PMID: 38328152 PMCID: PMC10849534 DOI: 10.1101/2024.01.20.576266] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
27
|
Audet S, Triassi V, Gelinas M, Legault-Cadieux N, Ferraro V, Duquette A, Tetreault M. Integration of multi-omics technologies for molecular diagnosis in ataxia patients. Front Genet 2024; 14:1304711. [PMID: 38239855 PMCID: PMC10794629 DOI: 10.3389/fgene.2023.1304711] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/27/2023] [Indexed: 01/22/2024] Open
Abstract
Background: Episodic ataxias are rare neurological disorders characterized by recurring episodes of imbalance and coordination difficulties. Obtaining definitive molecular diagnoses poses challenges, as clinical presentation is highly heterogeneous, and literature on the underlying genetics is limited. While the advent of high-throughput sequencing technologies has significantly contributed to Mendelian disorders genetics, interpretation of variants of uncertain significance and other limitations inherent to individual methods still leaves many patients undiagnosed. This study aimed to investigate the utility of multi-omics for the identification and validation of molecular candidates in a cohort of complex cases of ataxia with episodic presentation. Methods: Eight patients lacking molecular diagnosis despite extensive clinical examination were recruited following standard genetic testing. Whole genome and RNA sequencing were performed on samples isolated from peripheral blood mononuclear cells. Integration of expression and splicing data facilitated genomic variants prioritization. Subsequently, long-read sequencing played a crucial role in the validation of those candidate variants. Results: Whole genome sequencing uncovered pathogenic variants in four genes (SPG7, ATXN2, ELOVL4, PMPCB). A missense and a nonsense variant, both previously reported as likely pathogenic, configured in trans in individual #1 (SPG7: c.2228T>C/p.I743T, c.1861C>T/p.Q621*). An ATXN2 microsatellite expansion (CAG32) in another late-onset case. In two separate individuals, intronic variants near splice sites (ELOVL4: c.541 + 5G>A; PMPCB: c.1154 + 5G>C) were predicted to induce loss-of-function splicing, but had never been reported as disease-causing. Long-read sequencing confirmed the compound heterozygous variants configuration, repeat expansion length, as well as splicing landscape for those pathogenic variants. A potential genetic modifier of the ATXN2 expansion was discovered in ZFYVE26 (c.3022C>T/p.R1008*). Conclusion: Despite failure to identify pathogenic variants through clinical genetic testing, the multi-omics approach enabled the molecular diagnosis in 50% of patients, also giving valuable insights for variant prioritization in remaining cases. The findings demonstrate the value of long-read sequencing for the validation of candidate variants in various scenarios. Our study demonstrates the effectiveness of leveraging complementary omics technologies to unravel the underlying genetics in patients with unresolved rare diseases such as ataxia. Molecular diagnoses not only hold significant promise in improving patient care management, but also alleviates the burden of diagnostic odysseys, more broadly enhancing quality of life.
Collapse
Affiliation(s)
- Sebastien Audet
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Valerie Triassi
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
| | - Myriam Gelinas
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Nab Legault-Cadieux
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Vincent Ferraro
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Antoine Duquette
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
- Neurology Service, Department of Medicine, André-Barbeau Movement Disorders Unit, University of Montreal Hospital (CHUM), Montreal, QC, Canada
- Genetic Service, Department of Medicine, University of Montreal Hospital (CHUM), Montreal, QC, Canada
| | - Martine Tetreault
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| |
Collapse
|
28
|
Souza-Borges CH, Utsunomia R, Varani AM, Uliano-Silva M, Lira LVG, Butzge AJ, Gomez Agudelo JF, Manso S, Freitas MV, Ariede RB, Mastrochirico-Filho VA, Penaloza C, Barria A, Porto-Foresti F, Foresti F, Hattori R, Guiguen Y, Houston RD, Hashimoto DT. De novo assembly and characterization of a highly degenerated ZW sex chromosome in the fish Megaleporinus macrocephalus. Gigascience 2024; 13:giae085. [PMID: 39589439 PMCID: PMC11590113 DOI: 10.1093/gigascience/giae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 07/31/2024] [Accepted: 10/14/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND Megaleporinus macrocephalus (piauçu) is a Neotropical fish within Characoidei that presents a well-established heteromorphic ZZ/ZW sex determination system and thus constitutes a good model for studying W and Z chromosomes in fishes. We used PacBio reads and Hi-C to assemble a chromosome-level reference genome for M. macrocephalus. We generated family segregation information to construct a genetic map, pool sequencing of males and females to characterize its sex system, and RNA sequencing to highlight candidate genes of M. macrocephalus sex determination. RESULTS The reference genome of M. macrocephalus is 1,282,030,339 bp in length and has a contig and scaffold N50 of 5.0 Mb and 45.03 Mb, respectively. In the sex chromosome, based on patterns of recombination suppression, coverage, FST, and sex-specific SNPs, we distinguished a putative W-specific region that is highly differentiated, a region where Z and W still share some similarities and is undergoing degeneration, and the PAR. The sex chromosome gene repertoire includes genes from the TGF-β family (amhr2, bmp7) and the Wnt/β-catenin pathway (wnt4, wnt7a), some of which are differentially expressed. CONCLUSIONS The chromosome-level genome of piauçu exhibits high quality, establishing a valuable resource for advancing research within the group. Our discoveries offer insights into the evolutionary dynamics of Z and W sex chromosomes in fish, emphasizing ongoing degenerative processes and indicating complex interactions between Z and W sequences in specific genomic regions. Notably, amhr2 and bmp7 are potential candidate genes for sex determination in M. macrocephalus.
Collapse
Affiliation(s)
| | - Ricardo Utsunomia
- School of Sciences, São Paulo State University (Unesp), Bauru, SP, 17033-360, Brazil
| | - Alessandro M Varani
- School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | | | - Lieschen Valeria G Lira
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Arno J Butzge
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - John F Gomez Agudelo
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Shisley Manso
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Milena V Freitas
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Raquel B Ariede
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | | | - Carolina Penaloza
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, United Kingdom
| | - Agustín Barria
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, United Kingdom
| | - Fábio Porto-Foresti
- School of Sciences, São Paulo State University (Unesp), Bauru, SP, 17033-360, Brazil
| | - Fausto Foresti
- Institute of Biosciences, São Paulo State University (Unesp), Botucatu, SP, 18618-689, Brazil
| | - Ricardo Hattori
- São Paulo Agency of Agribusiness and Technology (APTA), São Paulo, SP, 01037-010, Brazil
| | | | - Ross D Houston
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, United Kingdom
| | - Diogo Teruo Hashimoto
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| |
Collapse
|
29
|
Yeetong P, Dembélé ME, Pongpanich M, Cissé L, Srichomthong C, Maiga AB, Dembélé K, Assawapitaksakul A, Bamba S, Yalcouyé A, Diarra S, Mefoung SE, Rakwongkhachon S, Traoré O, Tongkobpetch S, Fischbeck KH, Gahl WA, Guinto CO, Shotelersuk V, Landouré G. Pentanucleotide Repeat Insertions in RAI1 Cause Benign Adult Familial Myoclonic Epilepsy Type 8. Mov Disord 2024; 39:164-172. [PMID: 37994247 PMCID: PMC10872918 DOI: 10.1002/mds.29654] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 10/04/2023] [Accepted: 10/24/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND Benign adult familial myoclonic epilepsy (BAFME) is an autosomal dominant disorder characterized by cortical tremors and seizures. Six types of BAFME, all caused by pentanucleotide repeat expansions in different genes, have been reported. However, several other BAFME cases remain with no molecular diagnosis. OBJECTIVES We aim to characterize clinical features and identify the mutation causing BAFME in a large Malian family with 10 affected members. METHODS Long-read whole genome sequencing, repeat-primed polymerase chain reaction and RNA studies were performed. RESULTS We identified TTTTA repeat expansions and TTTCA repeat insertions in intron 4 of the RAI1 gene that co-segregated with disease status in this family. TTTCA repeats were absent in 200 Malian controls. In the affected individuals, we found a read with only nine TTTCA repeat units and somatic instability. The RAI1 repeat expansions cause the only BAFME type in which the disease-causing repeats are in a gene associated with a monogenic disorder in the haploinsufficiency state (ie, Smith-Magenis syndrome [SMS]). Nevertheless, none of the Malian patients exhibited symptoms related to SMS. Moreover, leukocyte RNA levels of RAI1 in six Malian BAFME patients were no different from controls. CONCLUSIONS These findings establish a new type of BAFME, BAFME8, in an African family and suggest that haploinsufficiency is unlikely to be the main pathomechanism of BAFME. © 2023 International Parkinson and Movement Disorder Society. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Collapse
Affiliation(s)
- Patra Yeetong
- Division of Human Genetics, Department of Botany, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | | | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
- Omics Sciences and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Lassana Cissé
- Service de Neurologie, Centre Hospitalier Universitaire du Point G, Bamako, Mali
| | - Chalurmpon Srichomthong
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | | | | | - Adjima Assawapitaksakul
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | - Salia Bamba
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
| | | | - Salimata Diarra
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
- Yale University, Pediatric Genomics Discovery Program, Department of Pediatrics, New Haven, CT, United States
- Neurogenetics Branch, NINDS, NIH, Bethesda, MD, United States
| | | | - Supphakorn Rakwongkhachon
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | - Oumou Traoré
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
| | - Siraprapa Tongkobpetch
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | | | - William A Gahl
- Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Cheick O Guinto
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
- Service de Neurologie, Centre Hospitalier Universitaire du Point G, Bamako, Mali
| | - Vorasuk Shotelersuk
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | - Guida Landouré
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
- Service de Neurologie, Centre Hospitalier Universitaire du Point G, Bamako, Mali
| |
Collapse
|
30
|
Ando M, Higuchi Y, Yuan J, Yoshimura A, Kojima F, Yamanishi Y, Aso Y, Izumi K, Imada M, Maki Y, Nakagawa H, Hobara T, Noguchi Y, Takei J, Hiramatsu Y, Nozuma S, Sakiyama Y, Hashiguchi A, Matsuura E, Okamoto Y, Takashima H. Clinical variability associated with intronic FGF14 GAA repeat expansion in Japan. Ann Clin Transl Neurol 2024; 11:96-104. [PMID: 37916889 PMCID: PMC10791012 DOI: 10.1002/acn3.51936] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/18/2023] [Accepted: 10/19/2023] [Indexed: 11/03/2023] Open
Abstract
BACKGROUND AND OBJECTIVES The GAA repeat expansion within the fibroblast growth factor 14 (FGF14) gene has been found to be associated with late-onset cerebellar ataxia. This study aimed to investigate the genetic causes of cerebellar ataxia in patients in Japan. METHODS We collected a case series of 940 index patients who presented with chronic cerebellar ataxia and remained genetically undiagnosed after our preliminary genetic screening. To investigate the FGF14 repeat locus, we employed an integrated diagnostic strategy that involved fluorescence amplicon length analysis polymerase chain reaction (PCR), repeat-primed PCR, and long-read sequencing. RESULTS Pathogenic FGF14 GAA repeat expansions were detected in 12 patients from 11 unrelated families. The median size of the pathogenic GAA repeat was 309 repeats (range: 270-316 repeats). In these patients, the mean age of onset was 66.9 ± 9.6 years, with episodic symptoms observed in 56% of patients and parkinsonism in 30% of patients. We also detected FGF14 repeat expansions in a patient with a phenotype of multiple system atrophy, including cerebellar ataxia, parkinsonism, autonomic ataxia, and bilateral vocal cord paralysis. Brain magnetic resonance imaging (MRI) showed normal to mild cerebellar atrophy, and a follow-up study conducted after a mean period of 6 years did not reveal any significant progression. DISCUSSION This study highlights the importance of FGF14 GAA repeat analysis in patients with late-onset cerebellar ataxia, particularly when they exhibit episodic symptoms, or their brain MRI shows no apparent cerebellar atrophy. Our findings contribute to a better understanding of the clinical variability of GAA-FGF14-related diseases.
Collapse
Affiliation(s)
- Masahiro Ando
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yujiro Higuchi
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Junhui Yuan
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Akiko Yoshimura
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Fumikazu Kojima
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yuki Yamanishi
- Department of Neurology and Clinical PharmacologyEhime University HospitalToonEhimeJapan
| | - Yasuhiro Aso
- Department of NeurologyOita Prefecture HospitalOitaJapan
| | - Kotaro Izumi
- Department of NeurologyOhashi Go Neurosurgical Neurology ClinicFukuokaJapan
| | - Minako Imada
- Department of NeurologyNational Hospital Organization Minamikyushu HospitalKagoshimaJapan
| | - Yoshimitsu Maki
- Department of NeurologyKagoshima City HospitalKagoshimaJapan
| | - Hiroto Nakagawa
- Department of NeurologyKagoshima Medical Association HospitalKagoshimaJapan
| | - Takahiro Hobara
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yutaka Noguchi
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Jun Takei
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yu Hiramatsu
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Satoshi Nozuma
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yusuke Sakiyama
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Akihiro Hashiguchi
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Eiji Matsuura
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yuji Okamoto
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
- Department of Physical Therapy, Faculty of MedicineSchool of Health Sciences, Kagoshima UniversityKagoshimaJapan
| | - Hiroshi Takashima
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| |
Collapse
|
31
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
32
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
33
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
34
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
35
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
36
|
Hård J, Mold JE, Eisfeldt J, Tellgren-Roth C, Häggqvist S, Bunikis I, Contreras-Lopez O, Chin CS, Nordlund J, Rubin CJ, Feuk L, Michaëlsson J, Ameur A. Long-read whole-genome analysis of human single cells. Nat Commun 2023; 14:5164. [PMID: 37620373 PMCID: PMC10449900 DOI: 10.1038/s41467-023-40898-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 08/07/2023] [Indexed: 08/26/2023] Open
Abstract
Long-read sequencing has dramatically increased our understanding of human genome variation. Here, we demonstrate that long-read technology can give new insights into the genomic architecture of individual cells. Clonally expanded CD8+ T-cells from a human donor were subjected to droplet-based multiple displacement amplification (dMDA) to generate long molecules with reduced bias. PacBio sequencing generated up to 40% genome coverage per single-cell, enabling detection of single nucleotide variants (SNVs), structural variants (SVs), and tandem repeats, also in regions inaccessible by short reads. 28 somatic SNVs were detected, including one case of mitochondrial heteroplasmy. 5473 high-confidence SVs/cell were discovered, a sixteen-fold increase compared to Illumina-based results from clonally related cells. Single-cell de novo assembly generated a genome size of up to 598 Mb and 1762 (12.8%) complete gene models. In summary, our work shows the promise of long-read sequencing toward characterization of the full spectrum of genetic variation in single cells.
Collapse
Affiliation(s)
- Joanna Hård
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden.
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
- ETH AI Center, ETH Zurich, Zurich, Switzerland.
| | - Jeff E Mold
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Christian Tellgren-Roth
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Susana Häggqvist
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | | | - Jessica Nordlund
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Carl-Johan Rubin
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jakob Michaëlsson
- Center for Infectious Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
37
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
38
|
Kume K, Kurashige T, Muguruma K, Morino H, Tada Y, Kikumoto M, Miyamoto T, Akutsu SN, Matsuda Y, Matsuura S, Nakamori M, Nishiyama A, Izumi R, Niihori T, Ogasawara M, Eura N, Kato T, Yokomura M, Nakayama Y, Ito H, Nakamura M, Saito K, Riku Y, Iwasaki Y, Maruyama H, Aoki Y, Nishino I, Izumi Y, Aoki M, Kawakami H. CGG repeat expansion in LRP12 in amyotrophic lateral sclerosis. Am J Hum Genet 2023; 110:1086-1097. [PMID: 37339631 PMCID: PMC10357476 DOI: 10.1016/j.ajhg.2023.05.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 05/25/2023] [Accepted: 05/25/2023] [Indexed: 06/22/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder characterized by the degeneration of motor neurons. Although repeat expansion in C9orf72 is its most common cause, the pathogenesis of ALS isn't fully clear. In this study, we show that repeat expansion in LRP12, a causative variant of oculopharyngodistal myopathy type 1 (OPDM1), is a cause of ALS. We identify CGG repeat expansion in LRP12 in five families and two simplex individuals. These ALS individuals (LRP12-ALS) have 61-100 repeats, which contrasts with most OPDM individuals with repeat expansion in LRP12 (LRP12-OPDM), who have 100-200 repeats. Phosphorylated TDP-43 is present in the cytoplasm of iPS cell-derived motor neurons (iPSMNs) in LRP12-ALS, a finding that reproduces the pathological hallmark of ALS. RNA foci are more prominent in muscle and iPSMNs in LRP12-ALS than in LRP12-OPDM. Muscleblind-like 1 aggregates are observed only in OPDM muscle. In conclusion, CGG repeat expansions in LRP12 cause ALS and OPDM, depending on the length of the repeat. Our findings provide insight into the repeat length-dependent switching of phenotypes.
Collapse
Affiliation(s)
- Kodai Kume
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Takashi Kurashige
- Department of Neurology, National Hospital Organization Kure Medical Center and Chugoku Cancer Center, Hiroshima, Japan
| | - Keiko Muguruma
- Department of iPS Cell Applied Medicine, Graduate School of Medicine, Kansai Medical University, Osaka, Japan
| | - Hiroyuki Morino
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Yui Tada
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Mai Kikumoto
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan; Department of Clinical Neuroscience and Therapeutics, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Tatsuo Miyamoto
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Silvia Natsuko Akutsu
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Yukiko Matsuda
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Shinya Matsuura
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Masahiro Nakamori
- Department of Clinical Neuroscience and Therapeutics, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Ayumi Nishiyama
- Department of Neurology, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Rumiko Izumi
- Department of Neurology, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Tetsuya Niihori
- Department of Medical Genetics, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Masashi Ogasawara
- Department of Neuromuscular Research, National Institute of Neuroscience, National Centre of Neurology and Psychiatry, National Centre Hospital, Tokyo, Japan
| | - Nobuyuki Eura
- Department of Neuromuscular Research, National Institute of Neuroscience, National Centre of Neurology and Psychiatry, National Centre Hospital, Tokyo, Japan
| | - Tamaki Kato
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Mamoru Yokomura
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Yoshiaki Nakayama
- Department of Neurology, Wakayama Medical University, Wakayama, Japan
| | - Hidefumi Ito
- Department of Neurology, Wakayama Medical University, Wakayama, Japan
| | | | - Kayoko Saito
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Yuichi Riku
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Yasushi Iwasaki
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Hirofumi Maruyama
- Department of Clinical Neuroscience and Therapeutics, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Yoko Aoki
- Department of Medical Genetics, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Ichizo Nishino
- Department of Neuromuscular Research, National Institute of Neuroscience, National Centre of Neurology and Psychiatry, National Centre Hospital, Tokyo, Japan
| | - Yuishin Izumi
- Department of Neurology, Tokushima University Graduate School of Biomedical Sciences, Tokushima, Japan
| | - Masashi Aoki
- Department of Neurology, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Hideshi Kawakami
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan.
| |
Collapse
|
39
|
Ikemoto K, Fujimoto H, Fujimoto A. Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes. Hum Genomics 2023; 17:21. [PMID: 36895025 PMCID: PMC9996862 DOI: 10.1186/s40246-023-00467-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/01/2023] [Indexed: 03/11/2023] Open
Abstract
BACKGROUND Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, the characterization of repetitive sequences by reconstructing genomic structures at high resolution solely from long reads remains difficult. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads. METHODS We developed LoMA by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data. RESULTS The assessment of LoMA showed a high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to a previous study. The genome-wide analysis of NA18943 and NA19240 identified 5516 and 6542 insertions (≥ 100 bp), respectively. Most insertions (~ 80%) were derived from tandem repeats and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Finally, our analysis suggested that short tandem duplications are associated with gene expression and transposons. CONCLUSIONS Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of the insertions with high accuracy and inferred the mechanisms for the insertions, thus contributing to future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma .
Collapse
Affiliation(s)
- Ko Ikemoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan
| | - Hinano Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan.
| |
Collapse
|
40
|
MSINGB: A Novel Computational Method Based on NGBoost for Identifying Microsatellite Instability Status from Tumor Mutation Annotation Data. Interdiscip Sci 2023; 15:100-110. [PMID: 36350503 DOI: 10.1007/s12539-022-00544-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 11/11/2022]
Abstract
Microsatellite instability (MSI), a vital mutator phenotype caused by DNA mismatch repair deficiency, is frequently observed in several tumors. MSI is recognized as a critical molecular biomarker for diagnosis, prognosis, and therapeutic selection in several cancers. Identifying MSI status for current gold standard methods based on experimental analysis is laborious, time-consuming, and costly. Although several computational methods based on machine learning have been proposed to identify MSI status, we need to further understand which machine learning model would favor identification for MSI and which feature subset is strongly related to MSI. On this basis, more effective machine learning-based methods can be developed to improve the performance of MSI status identification. In this work, we present MSINGB, an NGBoost-based method for identifying MSI status from tumor somatic mutation annotation data. MSINGB first evaluates the prediction performance of 11 popular machine learning algorithms and 9 deep learning models to identify MSI. Among 20 models, NGBoost, a novel natural gradient boosting method, achieves the overall best performance. MSINGB then introduces two feature selection strategies to find the compact feature subset, which is strongly related to MSI, and employs the SHAP approach to interpreting how selected features impact the model prediction. MSINGB achieves a better prediction performance on both the tenfold cross-validation test and independent test compared with state-of-the-art methods.
Collapse
|
41
|
Wang P, Wang F. A proposed metric set for evaluation of genome assembly quality. Trends Genet 2023; 39:175-186. [PMID: 36402623 DOI: 10.1016/j.tig.2022.10.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/24/2022] [Accepted: 10/26/2022] [Indexed: 11/18/2022]
Abstract
Quality control is essential for genome assemblies; however, a consensus has yet to be reached on what metrics should be adopted for the evaluation of assembly quality. N50 is widely used for contiguity measurement, but its effectiveness is constantly in question. Prevailing metrics for the completeness evaluation focus on gene space, yet challenging areas such as tandem repeats are commonly overlooked. Achieving correctness has become an indispensable dimension for quality control, while prevailing assembly releases lack scores reflecting this aspect. We propose a metric set with a set of statistic indexes for effective, comprehensive evaluation of assemblies and provide a score of a finished assembly for each metric, which can be utilized as a benchmark for achieving high-quality genome assemblies.
Collapse
Affiliation(s)
- Peng Wang
- Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Ministry of Agriculture and Rural Affairs, Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, No. 4 Xueyuan Rd, Haikou City, Hainan 571101, China.
| | - Fei Wang
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, No. 100 Haiquan Rd, Shanghai 201416, China.
| |
Collapse
|
42
|
Abstract
Abnormal expansion or shortening of tandem repeats can cause a variety of genetic diseases. The use of long DNA reads has facilitated the analysis of disease-causing repeats in the human genome. Long read sequencers enable us to directly analyze repeat length and sequence content by covering whole repeats; they are therefore considered suitable for the analysis of long tandem repeats. Here, we describe an expanded repeat analysis using target sequencing data produced by the Oxford Nanopore Technologies (hereafter referred to as ONT) nanopore sequencer.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan.
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Kanagawa, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| |
Collapse
|
43
|
Chen P, Sun Z, Wang J, Liu X, Bai Y, Chen J, Liu A, Qiao F, Chen Y, Yuan C, Sha J, Zhang J, Xu LQ, Li J. Portable nanopore-sequencing technology: Trends in development and applications. Front Microbiol 2023; 14:1043967. [PMID: 36819021 PMCID: PMC9929578 DOI: 10.3389/fmicb.2023.1043967] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 01/03/2023] [Indexed: 02/04/2023] Open
Abstract
Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.
Collapse
Affiliation(s)
- Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Zepeng Sun
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Jiawei Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xinlong Liu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yun Bai
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Jiang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Feng Qiao
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Chenyan Yuan
- Clinical Laboratory, Southeast University Zhongda Hospital, Nanjing, China
| | - Jingjie Sha
- School of Mechanical Engineering, Southeast University, Nanjing, China
| | - Jinghui Zhang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Li-Qun Xu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China,*Correspondence: Li-Qun Xu, ✉
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China,Jian Li, ✉
| |
Collapse
|
44
|
Fan C, Chen K, Wang Y, Ball EV, Stenson PD, Mort M, Bacolla A, Kehrer-Sawatzki H, Tainer JA, Cooper DN, Zhao H. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections. Hum Genet 2023; 142:245-274. [PMID: 36344696 PMCID: PMC10290229 DOI: 10.1007/s00439-022-02500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/24/2022] [Indexed: 11/09/2022]
Abstract
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
Collapse
Affiliation(s)
- Cong Fan
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Yukai Wang
- School of Life Science, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Albino Bacolla
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | | | - John A Tainer
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China.
| |
Collapse
|
45
|
Lang J, Xu Z, Wang Y, Sun J, Yang Z. NanoSTR: A method for detection of target short tandem repeats based on nanopore sequencing data. Front Mol Biosci 2023; 10:1093519. [PMID: 36743210 PMCID: PMC9889824 DOI: 10.3389/fmolb.2023.1093519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Short tandem repeats (STRs) are widely present in the human genome. Studies have confirmed that STRs are associated with more than 30 diseases, and they have also been used in forensic identification and paternity testing. However, there are few methods for STR detection based on nanopore sequencing due to the challenges posed by the sequencing principles and the data characteristics of nanopore sequencing. We developed NanoSTR for detection of target STR loci based on the length-number-rank (LNR) information of reads. NanoSTR can be used for STR detection and genotyping based on long-read data from nanopore sequencing with improved accuracy and efficiency compared with other existing methods, such as Tandem-Genotypes and TRiCoLOR. NanoSTR showed 100% concordance with the expected genotypes using error-free simulated data, and also achieved >85% concordance using the standard samples (containing autosomal and Y-chromosomal loci) with MinION sequencing platform, respectively. NanoSTR showed high performance for detection of target STR markers. Although NanoSTR needs further optimization and development, it is useful as an analytical method for the detection of STR loci by nanopore sequencing. This method adds to the toolbox for nanopore-based STR analysis and expands the applications of nanopore sequencing in scientific research and clinical scenarios. The main code and the data are available at https://github.com/langjidong/NanoSTR.
Collapse
|
46
|
Rafehi H, Read J, Szmulewicz DJ, Davies KC, Snell P, Fearnley LG, Scott L, Thomsen M, Gillies G, Pope K, Bennett MF, Munro JE, Ngo KJ, Chen L, Wallis MJ, Butler EG, Kumar KR, Wu KHC, Tomlinson SE, Tisch S, Malhotra A, Lee-Archer M, Dolzhenko E, Eberle MA, Roberts LJ, Fogel BL, Brüggemann N, Lohmann K, Delatycki MB, Bahlo M, Lockhart PJ. An intronic GAA repeat expansion in FGF14 causes the autosomal-dominant adult-onset ataxia SCA50/ATX-FGF14. Am J Hum Genet 2023; 110:105-119. [PMID: 36493768 PMCID: PMC9892775 DOI: 10.1016/j.ajhg.2022.11.015] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/19/2022] [Indexed: 12/13/2022] Open
Abstract
Adult-onset cerebellar ataxias are a group of neurodegenerative conditions that challenge both genetic discovery and molecular diagnosis. In this study, we identified an intronic (GAA) repeat expansion in fibroblast growth factor 14 (FGF14). Genetic analysis of 95 Australian individuals with adult-onset ataxia identified four (4.2%) with (GAA)>300 and a further nine individuals with (GAA)>250. PCR and long-read sequence analysis revealed these were pure (GAA) repeats. In comparison, no control subjects had (GAA)>300 and only 2/311 control individuals (0.6%) had a pure (GAA)>250. In a German validation cohort, 9/104 (8.7%) of affected individuals had (GAA)>335 and a further six had (GAA)>250, whereas 10/190 (5.3%) control subjects had (GAA)>250 but none were (GAA)>335. The combined data suggest (GAA)>335 are disease causing and fully penetrant (p = 6.0 × 10-8, OR = 72 [95% CI = 4.3-1,227]), while (GAA)>250 is likely pathogenic with reduced penetrance. Affected individuals had an adult-onset, slowly progressive cerebellar ataxia with variable features including vestibular impairment, hyper-reflexia, and autonomic dysfunction. A negative correlation between age at onset and repeat length was observed (R2 = 0.44, p = 0.00045, slope = -0.12) and identification of a shared haplotype in a minority of individuals suggests that the expansion can be inherited or generated de novo during meiotic division. This study demonstrates the power of genome sequencing and advanced bioinformatic tools to identify novel repeat expansions via model-free, genome-wide analysis and identifies SCA50/ATX-FGF14 as a frequent cause of adult-onset ataxia.
Collapse
Affiliation(s)
- Haloom Rafehi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Justin Read
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia
| | - David J. Szmulewicz
- Cerebellar Ataxia Clinic, Eye and Ear Hospital, Melbourne, VIC, Australia,The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Melbourne, VIC, Australia
| | - Kayli C. Davies
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia
| | - Penny Snell
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Liam G. Fearnley
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia,Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Liam Scott
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
| | - Mirja Thomsen
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Greta Gillies
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Kate Pope
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Mark F. Bennett
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia,Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, Australia
| | - Jacob E. Munro
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Kathie J. Ngo
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Luke Chen
- Alfred Hospital, Department of Neurology, Melbourne, VIC, Australia
| | - Mathew J. Wallis
- Clinical Genetics Service, Austin Health, Melbourne, VIC, Australia,Department of Medicine, University of Melbourne, Austin Health, Melbourne, VIC, Australia,School of Medicine and Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia
| | | | - Kishore R. Kumar
- Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia,Molecular Medicine Laboratory and Department of Neurology, Concord Repatriation General Hospital, Concord, NSW, Australia,Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Kathy HC. Wu
- School of Medicine, University of New South Wales, Sydney, NSW, Australia,Clinical Genomics, St Vincent’s Hospital, Darlinghurst, NSW, Australia,Discipline of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia,School of Medicine, University of Notre Dame, Sydney, NSW, Australia
| | - Susan E. Tomlinson
- School of Medicine, University of Notre Dame, Sydney, NSW, Australia,Department of Neurology, St Vincent’s Hospital, Darlinghurst, NSW, Australia
| | - Stephen Tisch
- School of Medicine, University of New South Wales, Sydney, NSW, Australia,Department of Neurology, St Vincent’s Hospital, Darlinghurst, NSW, Australia
| | - Abhishek Malhotra
- Department of Neuroscience, University Hospital Geelong, Geelong, VIC, Australia
| | - Matthew Lee-Archer
- Launceston General Hospital, Tasmanian Health Service, Launceston, TAS, Australia
| | | | | | - Leslie J. Roberts
- Department of Neurology and Neurological Research, St. Vincent’s Hospital, Melbourne, VIC, Australia
| | - Brent L. Fogel
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA,Departments of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Norbert Brüggemann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany,Department of Neurology, University Medical Center Schleswig-Holstein, Campus Lübeck, Germany
| | - Katja Lohmann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Martin B. Delatycki
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia,Victorian Clinical Genetics Services, Melbourne, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia.
| | - Paul J. Lockhart
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia,Corresponding author
| |
Collapse
|
47
|
Frith MC, Mitsuhashi S. Finding Rearrangements in Nanopore DNA Reads with LAST and dnarrange. Methods Mol Biol 2023; 2632:161-175. [PMID: 36781728 DOI: 10.1007/978-1-0716-2996-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Long-read DNA sequencing techniques such as nanopore are especially useful for characterizing complex sequence rearrangements, which occur in some genetic diseases and also during evolution. Analyzing the sequence data to understand such rearrangements is not trivial, due to sequencing error, rearrangement intricacy, and abundance of repeated similar sequences in genomes.The LAST and dnarrange software packages can resolve complex relationships between DNA sequences and characterize changes such as gene conversion, processed pseudogene insertion, and chromosome shattering. They can filter out numerous rearrangements shared by controls, e.g., healthy humans versus a patient, to focus on rearrangements unique to the patient. One useful ingredient is last-train, which learns the rates (probabilities) of deletions, insertions, and each kind of base match and mismatch. These probabilities are then used to find the most likely sequence relationships/alignments, which is especially useful for DNA with unusual rates, such as DNA from Plasmodium falciparum (malaria) with ∼80% a+t. This is also useful for less-studied species that lack reference genomes, so the DNA reads are compared to a different species' genome. We also point out that a reference genome with ancestral alleles would be ideal.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan.
| | - Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Japan
| |
Collapse
|
48
|
Taylor A, Barros D, Gobet N, Schuepbach T, McAllister B, Aeschbach L, Randall E, Trofimenko E, Heuchan E, Barszcz P, Ciosi M, Morgan J, Hafford-Tear N, Davidson A, Massey T, Monckton D, Jones L, network REGISTRYH, Xenarios I, Dion V. Repeat Detector: versatile sizing of expanded tandem repeats and identification of interrupted alleles from targeted DNA sequencing. NAR Genom Bioinform 2022; 4:lqac089. [PMID: 36478959 PMCID: PMC9719798 DOI: 10.1093/nargab/lqac089] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 10/25/2022] [Accepted: 11/08/2022] [Indexed: 12/07/2022] Open
Abstract
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington's disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntington's disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies.
Collapse
Affiliation(s)
- Alysha S Taylor
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Dinis Barros
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Nastassia Gobet
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Thierry Schuepbach
- Vital-IT Group, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Newbiologix, Ch. De la corniche 6-8, 1066 Epalinges, Switzerland
| | - Branduff McAllister
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lorene Aeschbach
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Emma L Randall
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Evgeniya Trofimenko
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
- Sorbonne Université, École normale supérieure, PSL University, CNRS, Laboratoire des biomolécules, LBM, 75005 Paris, France
| | - Eleanor R Heuchan
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Paula Barszcz
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Marc Ciosi
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, Davidson Building, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Joanne Morgan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | | | - Alice E Davidson
- UCL Institute of Ophthalmology, 11-43 Bath Street, London, EC1V 9EL UK
| | - Thomas H Massey
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | - Darren G Monckton
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, Davidson Building, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Lesley Jones
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | | | - Ioannis Xenarios
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
- Health2030 Genome Center, Ch des Mines 14, 1202 Genève, Switzerland
| | - Vincent Dion
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| |
Collapse
|
49
|
Xylogiannopoulos KF. Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants. J Biotechnol 2022; 359:130-141. [PMID: 36195206 PMCID: PMC9527188 DOI: 10.1016/j.jbiotec.2022.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 06/26/2022] [Accepted: 09/26/2022] [Indexed: 11/05/2022]
Abstract
Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.
Collapse
|
50
|
Arslan A. Systematic Inspection of Genomic Tandem Repeats and Rearrangements in Autism Model. BRAIN DISORDERS 2022. [DOI: 10.1016/j.dscb.2022.100059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
|