1
|
van der Sanden B, Neveling K, Shukor S, Gallagher MD, Lee J, Burke SL, Pennings M, van Beek R, Oorsprong M, Kater-Baats E, Kamping E, Tieleman AA, Voermans NC, Scheffer IE, Gecz J, Corbett MA, Vissers LELM, Pang AWC, Hastie A, Kamsteeg EJ, Hoischen A. Optical genome mapping enables accurate testing of large repeat expansions. Genome Res 2025; 35:810-823. [PMID: 40113266 PMCID: PMC12047237 DOI: 10.1101/gr.279491.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 02/24/2025] [Indexed: 03/22/2025]
Abstract
Short tandem repeats (STRs) are common variations in human genomes that frequently expand or contract, causing genetic disorders, mainly when expanded. Traditional diagnostic methods for identifying these expansions, such as repeat-primed PCR and Southern blotting, are often labor-intensive, locus-specific, and are unable to precisely determine long repeat expansions. Sequencing-based methods, although capable of genome-wide detection, are limited by inaccuracy (short-read technologies) and high associated costs (long-read technologies). This study evaluated optical genome mapping (OGM) as an efficient, accurate approach for measuring STR lengths and assessing somatic stability in 85 samples with known pathogenic repeat expansions in DMPK, CNBP, and RFC1, causing myotonic dystrophy types 1 and 2 and cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS), respectively. Three workflows-manual de novo assembly, local guided assembly (local-GA), and a molecule distance script-were applied, of which the latter two were developed as part of this study to assess the repeat sizes and somatic repeat stability. OGM successfully identified 84/85 (98.8%) of the pathogenic expansions, distinguishing between wild-type and expanded alleles or between two expanded alleles in recessive cases, with greater accuracy than standard of care (SOC) for long repeats and no apparent upper size limit. Notably, OGM detected somatic instability in a subset of DMPK, CNBP, and RFC1 samples. These findings suggest OGM could advance diagnostic accuracy for large repeat expansions, providing a more comprehensive genome-wide assay for repeat expansion disorders by measuring exact repeat lengths and somatic instability across multiple loci simultaneously.
Collapse
Affiliation(s)
- Bart van der Sanden
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Kornelia Neveling
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Syukri Shukor
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Michael D Gallagher
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Joyce Lee
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Stephanie L Burke
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Maartje Pennings
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ronald van Beek
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Michiel Oorsprong
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ellen Kater-Baats
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Eveline Kamping
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Alide A Tieleman
- Department of Neurology, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Nicol C Voermans
- Department of Neurology, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, VIC 3084, Australia
- Department of Pediatrics, University of Melbourne, Royal Children's Hospital, Florey and Murdoch Children's Research Institutes, VIC 3052, Melbourne, Australia
| | - Jozef Gecz
- South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
- Robinson Research Institute and Adelaide Medical School, University of Adelaide, Adelaide, SA 5000, Australia
| | - Mark A Corbett
- Robinson Research Institute and Adelaide Medical School, University of Adelaide, Adelaide, SA 5000, Australia
| | - Lisenka E L M Vissers
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| | - Andy Wing Chun Pang
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Alex Hastie
- Bionano Genomics Clinical and Scientific Affairs, San Diego, California 92101, USA
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands;
| | - Alexander Hoischen
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud University Medical Center, 6525GA Nijmegen, the Netherlands;
- Department of Internal Medicine, Radboud Expertise Center for Immunodeficiency and Autoinflammation and Radboud Center for Infectious Disease (RCI), Radboud University Medical Center, 6525GA Nijmegen, the Netherlands
| |
Collapse
|
2
|
Xu IRL, Danzi MC, Raposo J, Züchner S. The continued promise of genomic technologies and software in neurogenetics. J Neuromuscul Dis 2025:22143602251325345. [PMID: 40208247 DOI: 10.1177/22143602251325345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
The continued evolution of genomic technologies over the past few decades has revolutionized the field of neurogenetics, offering profound insights into the genetic underpinnings of neurological disorders. Identification of causal genes for numerous monogenic neurological conditions has informed key aspects of disease mechanisms and facilitated research into critical proteins and molecular pathways, laying the groundwork for therapeutic interventions. However, the question remains: has this transformative trend reached its zenith? In this review, we suggest that despite significant strides in genome sequencing and advanced computational analyses, there is still ample room for methodological refinement. We anticipate further major genetic breakthroughs corresponding with the increased use of long-read genomes, variant calling software, AI tools, and data aggregation databases. Genetic progress has historically been driven by technological advancements from the commercial sector, which are developed in response to academic research needs, creating a continuous cycle of innovation and discovery. This review explores the potential of genomic technologies to address the challenges of neurogenetic disorders. By outlining both established and modern resources, we aim to emphasize the importance of genetic technologies as we enter an era poised for discoveries.
Collapse
Affiliation(s)
- Isaac R L Xu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jacquelyn Raposo
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
3
|
Liu Y, Xia K. Aberrant Short Tandem Repeats: Pathogenicity, Mechanisms, Detection, and Roles in Neuropsychiatric Disorders. Genes (Basel) 2025; 16:406. [PMID: 40282366 PMCID: PMC12026680 DOI: 10.3390/genes16040406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/17/2025] [Accepted: 03/19/2025] [Indexed: 04/29/2025] Open
Abstract
Short tandem repeat (STR) sequences are highly variable DNA segments that significantly contribute to human neurodegenerative disorders, highlighting their crucial role in neuropsychiatric conditions. This article examines the pathogenicity of abnormal STRs and classifies tandem repeat expansion disorders(TREDs), emphasizing their genetic characteristics, mechanisms of action, detection methods, and associated animal models. STR expansions exhibit complex genetic patterns that affect the age of onset and symptom severity. These expansions disrupt gene function through mechanisms such as gene silencing, toxic gain-of-function mutations leading to RNA and protein toxicity, and the generation of toxic peptides via repeat-associated non-AUG (RAN) translation. Advances in sequencing technologies-from traditional PCR and Southern blotting to next-generation and long-read sequencing-have enhanced the accuracy of STR variation detection. Research utilizing these technologies has linked STR expansions to a range of neuropsychiatric disorders, including autism spectrum disorders and schizophrenia, highlighting their contribution to disease risk and phenotypic expression through effects on genes involved in neurodevelopment, synaptic function, and neuronal signaling. Therefore, further investigation is essential to elucidate the intricate interplay between STRs and neuropsychiatric diseases, paving the way for improved diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Yuzhong Liu
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| | - Kun Xia
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| |
Collapse
|
4
|
Tan X, Zeng W, Yang Y, Lin Z, Li F, Liu J, Chen S, Liu YG, Xie W, Xie X. Genome-wide profiling of polymorphic short tandem repeats and their influence on gene expression and trait variation in diverse rice populations. J Genet Genomics 2025:S1673-8527(25)00078-5. [PMID: 40089018 DOI: 10.1016/j.jgg.2025.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/10/2025] [Accepted: 03/10/2025] [Indexed: 03/17/2025]
Abstract
Short tandem repeats (STRs) modulate gene expression and contribute to trait variation. However, a systematic evaluation of the genomic characteristics of STRs has not been conducted, and their influence on gene expression in rice remains unclear. Here, we construct a map of 137,629 polymorphic STRs in the rice (Oryza sativa L.) genome using a population-scale resequencing dataset. A genome-wide survey encompassing 4726 accessions shows that the occurrence frequency, mutational patterns, chromosomal distribution, and functional properties of STRs are correlated with the sequences and lengths of repeat motifs. Leveraging a transcriptome dataset from 127 rice accessions, we identify 44,672 expression STRs (eSTRs) by modeling gene expression in response to the length variation of STRs. These eSTRs are notably enriched in the regulatory regions of genes with active transcriptional signatures. Population analysis identifies numerous STRs that have undergone genetic divergence among different rice groups and 1726 tagged STRs that may be associated with agronomic traits. By editing the (ACT)7 STR in OsFD1 promoter, we further experimentally validate its role in regulating gene expression and phenotype. Our study highlights the contribution of STRs to transcriptional regulation in plants and establishes the foundation for their potential use as alternative targets for genetic improvement.
Collapse
Affiliation(s)
- Xiyu Tan
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Wanyong Zeng
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Yujian Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Zhansheng Lin
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Fuquan Li
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Jianhong Liu
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Shaotong Chen
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Yao-Guang Liu
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China.
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| | - Xianrong Xie
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China.
| |
Collapse
|
5
|
Romsos EL, Kiesler KM, Steffen CR, Borsuk LA, Riman S, Mullen LE, Irwin JA, Vallone PM, Gettings KB. Development of Publicly Available Forensic DNA Sequence Mixture Data. Genes (Basel) 2025; 16:333. [PMID: 40149484 PMCID: PMC11941798 DOI: 10.3390/genes16030333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Revised: 03/07/2025] [Accepted: 03/11/2025] [Indexed: 03/29/2025] Open
Abstract
Background: In 2018, the Next-Generation Sequencing Committee of SWGDAM queried bioinformatic and statistical interpretation method developers regarding data needs for the development of sequence-based probabilistic genotyping software. Methods: Based on this engagement, a set of 74 mixture samples was conceived and created using 11 single-source samples. The allelic overlap among these samples was evaluated and sample combinations of varying complexity were selected, aiming to represent the variability observed in forensic casework. Results: The samples were distributed into a 96-well plate design containing several features: (1) three-person mixtures of 1% to 5% minor components in triplicate with varying levels of input DNA to provide information on sensitivity and reproducibility, (2) three-person mixtures containing degraded DNA of either only the major contributor or all three contributors, (3) four- and five-person mixtures with varying ratios and donors, (4) a single-source dilution series. Conclusions: Mixture samples were prepared and have been sequenced thus far with three commercially available kits targeting forensic short tandem repeat (STR) and single nucleotide polymorphism (SNP) markers, with FASTQ data files and metadata publicly available at doi.org/10.18434/M32157.
Collapse
Affiliation(s)
- Erica L. Romsos
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Kevin M. Kiesler
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Carolyn R. Steffen
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Lisa A. Borsuk
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Sarah Riman
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Lauren E. Mullen
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Jodi A. Irwin
- Federal Bureau of Investigation Laboratory, 2501 Investigation Parkway, Quantico, VA 22135, USA;
| | - Peter M. Vallone
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| | - Katherine B. Gettings
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA; (E.L.R.); (K.M.K.); (C.R.S.); (L.A.B.); (S.R.); (L.E.M.); (P.M.V.)
| |
Collapse
|
6
|
Jeanjean S, Shen Y, Hardy L, Daunay A, Delépine M, Gerber Z, Alberdi A, Tubacher E, Deleuze JF, How-Kit A. A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers. Nucleic Acids Res 2025; 53:gkaf131. [PMID: 40036507 PMCID: PMC11878640 DOI: 10.1093/nar/gkaf131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 01/13/2025] [Accepted: 02/11/2025] [Indexed: 03/06/2025] Open
Abstract
Microsatellites are short tandem repeats (STRs) of a motif of 1-6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, remain very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. Here, we assessed several second and third-generation sequencing approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard polymerase chain reaction (PCR)-free and PCR-containing, single Unique Molecular Indentifier (UMI) and dual UMI 'duplex sequencing' protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and Oxford Nanopore Technologies long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Collapse
Affiliation(s)
- Sophie I Jeanjean
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Yimin Shen
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Lise M Hardy
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Antoine Daunay
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Marc Delépine
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Zuzana Gerber
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Antonio Alberdi
- Technological Platform of Saint-Louis Research Institute (IRSL), Saint-Louis Hospital, University of Paris, 75010 Paris, France
| | - Emmanuel Tubacher
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| | - Jean-François Deleuze
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
- Laboratory for Bioinformatics, Foundation Jean Dausset – CEPH, 75010 Paris, France
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Institut François Jacob, 91000 Evry, France
| | - Alexandre How-Kit
- Laboratory for Genomics, Foundation Jean Dausset – CEPH, 75010 Paris, France
| |
Collapse
|
7
|
Al-Abri R, Gürsoy G. ScatTR: Estimating the Size of Long Tandem Repeat Expansions from Short-Reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.15.638440. [PMID: 40027646 PMCID: PMC11870476 DOI: 10.1101/2025.02.15.638440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Tandem repeats (TRs) are sequences of DNA where two or more base pairs are repeated back-to-back at specific locations in the genome. The expansions of TRs are implicated in over 50 conditions, including Friedreich's ataxia, autism, and cancer. However, accurately measuring the copy number of TRs is challenging, especially when their expansions are larger than the fragment sizes used in standard short-read genome sequencing. Here we introduce ScatTR, a novel computational method that leverages a maximum likelihood framework to estimate the copy number of large TR expansions from short-read sequencing data. ScatTR calculates the likelihood of different alignments between sequencing reads and reference sequences that represent various TR lengths and employs a Monte Carlo technique to find the best match. In simulated data, ScatTR outperforms state-of-the-art methods, particularly for TRs with longer motifs and those with lengths that greatly exceed typical sequencing fragment sizes. When applied to data from the 1000 Genomes Project, ScatTR detected potential large TR expansions that other methods missed, highlighting its ability to better identify genome-wide characterization of TR variation. ScatTR can be accessed via: https://github.com/g2lab/scattr .
Collapse
|
8
|
Doss RM, Lopez-Ignacio S, Dischler A, Hiatt L, Dashnow H, Breuss MW, Dias CM. Mosaicism in Short Tandem Repeat Disorders: A Clinical Perspective. Genes (Basel) 2025; 16:216. [PMID: 40004546 PMCID: PMC11855715 DOI: 10.3390/genes16020216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/06/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
Fragile X, Huntington disease, and myotonic dystrophy type 1 are prototypical examples of human disorders caused by short tandem repeat variation, repetitive nucleotide stretches that are highly mutable both in the germline and somatic tissue. As short tandem repeats are unstable, they can expand, contract, and acquire and lose epigenetic marks in somatic tissue. This means within an individual, the genotype and epigenetic state at these loci can vary considerably from cell to cell. This somatic mosaicism may play a key role in clinical pathogenesis, and yet, our understanding of mosaicism in driving clinical phenotypes in short tandem repeat disorders is only just emerging. This review focuses on these three relatively well-studied examples where, given the advent of new technologies and bioinformatic approaches, a critical role for mosaicism is coming into focus both with respect to cellular physiology and clinical phenotypes.
Collapse
Affiliation(s)
- Rose M. Doss
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Susana Lopez-Ignacio
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Anna Dischler
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Laurel Hiatt
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
| | - Harriet Dashnow
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Martin W. Breuss
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Caroline M. Dias
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Section of Developmental Pediatrics, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
9
|
Wang P, Sheng X, Xia X, Wang F, Li R, Ahmed Z, Chen N, Lei C, Ma Z. The genomic landscape of short tandem repeats in cattle. Anim Genet 2025; 56:e13498. [PMID: 39692037 DOI: 10.1111/age.13498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 12/04/2024] [Accepted: 12/05/2024] [Indexed: 12/19/2024]
Abstract
Short tandem repeats (STRs) are abundant and have high mutation rates across cattle genomes; however, comprehensive exploration of cattle STRs is needed. Here, we constructed a comprehensive map of 467 553 polymorphic STRs (pSTRs) constructed from 423 cattle genomes representing 59 breeds worldwide. We observed that pSTRs in coding sequences and 5'UTRs (Untranslated Regions) were under strong selective constraints and exhibited a relatively low level of diversity. Furthermore, we found that these pSTRs underwent more contraction than expansion. Population analysis showed a strong positive correlation (R = 1) between pSTR diversity and single nucleotide polymorphic heterozygosity. We also investigated STR differences between taurine and indicine cattle and detected 2301 highly divergent STRs, which might relate to immune, endocrine and neurodevelopmental pathways. In summary, our large-scale study characterizes the spectrum of STRs in cattle, expands the scale of known cattle STR variation and provides novel insights into differences among various cattle subspecies.
Collapse
Affiliation(s)
- Pengfei Wang
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, China
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xin Sheng
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, China
- Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Xiaoting Xia
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Fuwen Wang
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, China
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Ruizhe Li
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, China
- Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Zulfiqar Ahmed
- Department of Livestock and Poultry Production, Faculty of Veterinary and Animal Sciences, University of Poonch Rawalakot, Azad Jammu and Kashmir, Pakistan
| | - Ningbo Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Chuzhao Lei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Zhijie Ma
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, China
- Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| |
Collapse
|
10
|
Lojova I, Kucharik M, Pös Z, Balaz A, Zatkova A, Tothova Tarova E, Budis J, Kadasi L, Szemes T, Radvanszky J. Advancing molecular diagnostics of myotonic dystrophy type 1 using short-read whole genome sequencing. Mol Cell Probes 2025; 79:102005. [PMID: 39710066 DOI: 10.1016/j.mcp.2024.102005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 12/20/2024] [Accepted: 12/20/2024] [Indexed: 12/24/2024]
Abstract
Myotonic dystrophy type 1 (DM1) is a serious multisystem disorder caused by GCA repeat expansions in the DMPK gene. Early and accurate diagnosis, often requiring reliable DNA-diagnostic techniques, is critical for preventing life-threatening cardiac complications. Clinically, two main diagnostic challenges exist. Firstly, because of overlapping symptomatology with other conditions, conventional DNA-testing methods focusing on DM1 expansion detection ensure diagnostic results only in a small subset of patients, and frequently, further DNA-testing in remaining cases is necessary. Secondly, because of variable symptomatology and age of onset, not all DM1 patients are referred for DM1 genetic testing, leading to unrecognized but at-risk cases. When using conventional methods, the main technical problems are expanded-allele sizing and sensitivity to the presence of sequence interruptions. On a set of 50 individual genomes, including ten DM1 patients, we tested the performance of short-read whole-genome sequencing (WGS), one of the most up-to-date molecular testing methods. We identified all expansion-range DM1 alleles and characterized sequence interruptions in seven expansion-range/premutation-range alleles. Although neither the tested conventional methods, nor WGS allowed expanded-allele sizing, conventional methods provided higher sizing limits for normal-range alleles. Genotyping concordance rate was found to be 95-99 %. WGS was found to be superior in elucidating the sequence structure of the motifs, even if they fall outside the sizing limit (from partial reads). In addition, WGS enables the identification of genetic modifiers in other genes and the detection of alternative diagnoses in DM1-negative patients by extension of the bioinformatic evaluation of the generated data.
Collapse
Affiliation(s)
- Ingrid Lojova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Comenius University Science Park, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
| | - Marcel Kucharik
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Comenius University Science Park, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia
| | - Zuzana Pös
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia
| | - Andrej Balaz
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia; Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| | - Andrea Zatkova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia
| | - Eva Tothova Tarova
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Department of Biology, Faculty of Education, J. Selye University, Komárno, Slovakia
| | - Jaroslav Budis
- Comenius University Science Park, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia; Genovisio Ltd., Bratislava, Slovakia
| | - Ludevit Kadasi
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
| | - Tomas Szemes
- Comenius University Science Park, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia; Geneton Ltd., Bratislava, Slovakia
| | - Jan Radvanszky
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia; Comenius University Science Park, Bratislava, Slovakia; Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia; G2 Consulting Slovakia Ltd., Slovakia.
| |
Collapse
|
11
|
Maestri S, Scalzo D, Damaggio G, Zobel M, Besusso D, Cattaneo E. Navigating triplet repeats sequencing: concepts, methodological challenges and perspective for Huntington's disease. Nucleic Acids Res 2025; 53:gkae1155. [PMID: 39676657 PMCID: PMC11724279 DOI: 10.1093/nar/gkae1155] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 10/16/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024] Open
Abstract
The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
Collapse
Affiliation(s)
- Simone Maestri
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Davide Scalzo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Gianluca Damaggio
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Martina Zobel
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Dario Besusso
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Elena Cattaneo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| |
Collapse
|
12
|
Hou Q, Ji W, An K, Tan Y, Liu P, Su J. Genomic microsatellite characterization and development of polymorphic microsatellites in Eospalax baileyi. Sci Rep 2025; 15:524. [PMID: 39747356 PMCID: PMC11696105 DOI: 10.1038/s41598-024-84631-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 12/25/2024] [Indexed: 01/04/2025] Open
Abstract
Microsatellite markers are cost-effective, rapid, efficient, and show great advantages in in large-sample kinship analysis and population structure studies. However, microsatellite loci are seriously underdeveloped in non-model organisms. The plateau zokor (Eospalax baileyi) is a key species living underground in the Tibetan Plateau, the effective management of which has long been challenging. In this study, we analyzed the distribution characteristics and functions of microsatellites in the genome of plateau zokors, and their polymorphic sites. The mononucleotide and dinucleotide types being the most abundant in the genome. The largest number of microsatellites and their abundance in the intergenic region whereas the smallest number of microsatellites and their abundance in the coding region. The coding sequences containing microsatellites were annotated to 52 major functional genes and assigned 19,358 Gene Ontology entries. The Kyoto Encyclopedia of Genes and Genomes pathway was the most enriched in the signal transduction pathway. Thirteen pairs of polymorphic loci were successfully amplified, with the number of alleles ranging from 3 to 8, observed heterozygosity ranging from 0.059 to 0.810, and expected heterozygosity ranging from 0.469 to 0.854. These microsatellite markers provide a cornerstone for studies on the identification of parentage and population genetics of plateau zokors.
Collapse
Affiliation(s)
- Qiqi Hou
- Key Laboratory of Grassland Ecosystem (Ministry of Education), Pratacultural College, Gansu Agricultural University, Lanzhou, 730070, China
- Gansu Agricultural University-Massey University Research Centre for Grassland Biodiversity, Gansu Agricultural University, Lanzhou, 730070, China
| | - Weihong Ji
- Faculty of Science, University of Auckland, Auckland, New Zealand
| | - Kang An
- Key Laboratory of Grassland Ecosystem (Ministry of Education), Pratacultural College, Gansu Agricultural University, Lanzhou, 730070, China
- Gansu Agricultural University-Massey University Research Centre for Grassland Biodiversity, Gansu Agricultural University, Lanzhou, 730070, China
| | - Yuchen Tan
- Key Laboratory of Grassland Ecosystem (Ministry of Education), Pratacultural College, Gansu Agricultural University, Lanzhou, 730070, China
- Gansu Agricultural University-Massey University Research Centre for Grassland Biodiversity, Gansu Agricultural University, Lanzhou, 730070, China
| | - Penghui Liu
- Key Laboratory of Grassland Ecosystem (Ministry of Education), Pratacultural College, Gansu Agricultural University, Lanzhou, 730070, China
- Gansu Agricultural University-Massey University Research Centre for Grassland Biodiversity, Gansu Agricultural University, Lanzhou, 730070, China
| | - Junhu Su
- Key Laboratory of Grassland Ecosystem (Ministry of Education), Pratacultural College, Gansu Agricultural University, Lanzhou, 730070, China.
- Gansu Agricultural University-Massey University Research Centre for Grassland Biodiversity, Gansu Agricultural University, Lanzhou, 730070, China.
- Gansu Qilianshan Grassland Ecosystem Observation and Research Station, Wuwei, 733200, China.
| |
Collapse
|
13
|
Haasl RJ, Payseur BA. Fitness landscapes of human microsatellites. PLoS Genet 2024; 20:e1011524. [PMID: 39775235 PMCID: PMC11734926 DOI: 10.1371/journal.pgen.1011524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 01/15/2025] [Accepted: 12/03/2024] [Indexed: 01/11/2025] Open
Abstract
Advances in DNA sequencing technology and computation now enable genome-wide scans for natural selection to be conducted on unprecedented scales. By examining patterns of sequence variation among individuals, biologists are identifying genes and variants that affect fitness. Despite this progress, most population genetic methods for characterizing selection assume that variants mutate in a simple manner and at a low rate. Because these assumptions are violated by repetitive sequences, selection remains uncharacterized for an appreciable percentage of the genome. To meet this challenge, we focus on microsatellites, repetitive variants that mutate orders of magnitude faster than single nucleotide variants, can harbor substantial variation, and are known to influence biological function in some cases. We introduce four general models of natural selection that are each characterized by just two parameters, are easily simulated, and are specifically designed for microsatellites. Using a random forests approach to approximate Bayesian computation, we fit these models to carefully chosen microsatellites genotyped in 200 humans from a diverse collection of eight populations. Altogether, we reconstruct detailed fitness landscapes for 43 microsatellites we classify as targets of selection. Microsatellite fitness surfaces are diverse, including a range of selection strengths, contributions from dominance, and variation in the number and size of optimal alleles. Microsatellites that are subject to selection include loci known to cause trinucleotide expansion disorders and modulate gene expression, as well as intergenic loci with no obvious function. The heterogeneity in fitness landscapes we report suggests that genome-scale analyses like those used to assess selection targeting single nucleotide variants run the risk of oversimplifying the evolutionary dynamics of microsatellites. Moreover, our fitness landscapes provide a valuable visualization of the selective dynamics navigated by microsatellites.
Collapse
Affiliation(s)
- Ryan J. Haasl
- Department of Biology, University of Wisconsin-Platteville, Platteville, Wisconsin, United States of America
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
14
|
Zhang X, Ji X, Wang L, Chi L, Li C, Wen S, Chen H. STRsensor: a computationally efficient method for STR allele-typing from massively parallel sequencing data. Brief Bioinform 2024; 26:bbae637. [PMID: 39665493 PMCID: PMC11635639 DOI: 10.1093/bib/bbae637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/16/2024] [Accepted: 11/22/2024] [Indexed: 12/13/2024] Open
Abstract
Short tandem repeats (STRs) represent one of the most polymorphic variations in the human genome, finding extensive applications in forensics, population genetics and medical genetics. In contrast to the traditional capillary electrophoresis (CE) method, genotyping STRs using massive parallel sequencing technology offers enhanced sensitivity and accuracy. However, current methods are mainly designed for target sequencing with higher coverage for a specific STR locus, thereby constraining the utility of STRs in low- and medium-coverage whole genome sequencing (WGS) data. Here, we introduce STRsensor, a method designed to type STR alleles in low-coverage WGS data and target sequencing data, achieving a significant high detection ratio and accuracy. STRsensor employs two methods for STR allele-typing: the Kmers-based method and the CIGAR-based method. Furthermore, by incorporating a model for PCR stutters, STRsensor greatly enhances the accuracy of STR allele typing. With simulation data, we demonstrate that STRsensor achieves a detection ratio of 100$\%$ and an accuracy of 99.37$\%$ for a 30$\times $ WGS data, outperforming the existing methods, such as STRait Razor, STRinNGS, and HipSTR. When applied to real target sequencing data from 687 individuals, STRsensor achieves a detection ratio of 99.64$\%$ and an accuracy of 99.99$\%$. Moreover, STRsensor is a computationally efficient method that runs 79 times faster than HipSTR and 10 000 times faster than STRinNGS. STRsensor is freely available on GitHub: https://github.com/ChenHuaLab/STRsensor.
Collapse
Affiliation(s)
- Xiaolong Zhang
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xianchao Ji
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lingxiang Wang
- Institute of Archaeological Science, Fudan University, Shanghai 200032, China
| | - Lianjiang Chi
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Chengtao Li
- Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Shaoqing Wen
- Institute of Archaeological Science, Fudan University, Shanghai 200032, China
| | - Hua Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650023, China
| |
Collapse
|
15
|
Tesi N, Salazar A, Zhang Y, van der Lee S, Hulsman M, Knoop L, Wijesekera S, Krizova J, Schneider AF, Pennings M, Sleegers K, Kamsteeg EJ, Reinders M, Holstege H. Characterizing tandem repeat complexities across long-read sequencing platforms with TREAT and otter. Genome Res 2024; 34:1942-1953. [PMID: 39406499 DOI: 10.1101/gr.279351.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/03/2024] [Indexed: 11/09/2024]
Abstract
Tandem repeats (TRs) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs; however, the underlying bioinformatics perspectives remain challenging. We present otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization, and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and Pacific Bioscience (PacBio, Sequel II and Revio), otter and TREAT achieve state-of-the-art genotyping and motif characterization accuracy. Applied to clinically relevant TRs, TREAT/otter significantly identify individuals with pathogenic TR expansions. When applied to a case-control setting, we replicate previously reported associations of TRs with Alzheimer's disease, including those near or within APOC1 (P = 2.63 × 10-9), SPI1 (P = 6.5 × 10-3), and ABCA7 (P = 0.04) genes. Finally, we use TREAT/otter to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing data sets. We show that, in rare cases (0.06%), long-read sequencing from coverage drops in TRs, including the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TRs across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TRs in human genomes, with broad applications in research and clinical fields.
Collapse
Affiliation(s)
- Niccoló Tesi
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands;
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Alex Salazar
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Yaran Zhang
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sven van der Lee
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Marc Hulsman
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Lydian Knoop
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sanduni Wijesekera
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Jana Krizova
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Anne-Fleur Schneider
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Maartje Pennings
- Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Kristel Sleegers
- Complex Genetics of Alzheimer's Disease Group, Antwerp Center for Molecular Neurology, VIB, Antwerp B-2650, Belgium
| | - Erik-Jan Kamsteeg
- Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Henne Holstege
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| |
Collapse
|
16
|
Fedele E, Wetton JH, Jobling MA. Sequencing the orthologs of human autosomal forensic short tandem repeats provides individual- and species-level identification in African great apes. BMC Ecol Evol 2024; 24:134. [PMID: 39482599 PMCID: PMC11526555 DOI: 10.1186/s12862-024-02324-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 10/17/2024] [Indexed: 11/03/2024] Open
Abstract
BACKGROUND Great apes are a global conservation concern, with anthropogenic pressures threatening their survival. Genetic analysis can be used to assess the effects of reduced population sizes and the effectiveness of conservation measures. In humans, autosomal short tandem repeats (aSTRs) are widely used in population genetics and for forensic individual identification and kinship testing. Traditionally, genotyping is length-based via capillary electrophoresis (CE), but there is an increasing move to direct analysis by massively parallel sequencing (MPS). An example is the ForenSeq DNA Signature Prep Kit, which amplifies multiple loci including 27 aSTRs, prior to sequencing via Illumina technology. Here we assess the applicability of this human-based kit in African great apes. We ask whether cross-species genotyping of the orthologs of these loci can provide both individual and (sub)species identification. RESULTS The ForenSeq kit was used to amplify and sequence aSTRs in 52 individuals (14 chimpanzees; 4 bonobos; 16 western lowland, 6 eastern lowland, and 12 mountain gorillas). The orthologs of 24/27 human aSTRs amplified across species, and a core set of thirteen loci could be genotyped in all individuals. Genotypes were individually and (sub)species identifying. Both allelic diversity and the power to discriminate (sub)species were greater when considering STR sequences rather than allele lengths. Comparing human and African great-ape STR sequences with an orangutan outgroup showed general conservation of repeat types and allele size ranges. Variation in repeat array structures and a weak relationship with the known phylogeny suggests stochastic origins of mutations giving rise to diverse imperfect repeat arrays. Interruptions within long repeat arrays in African great apes do not appear to reduce allelic diversity. CONCLUSIONS Orthologs of most human aSTRs in the ForenSeq DNA Signature Prep Kit can be analysed in African great apes. Primer redesign would reduce observed variability in amplification across some loci. MPS of the orthologs of human loci provides better resolution for both individual and (sub)species identification in great apes than standard CE-based approaches, and has the further advantage that there is no need to limit the number and size ranges of analysed loci.
Collapse
Affiliation(s)
- Ettore Fedele
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, LE1 7RH, UK
- Current address: Faculty of Science & Engineering, Swansea University, Swansea, UK
| | - Jon H Wetton
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, LE1 7RH, UK.
| | - Mark A Jobling
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, LE1 7RH, UK.
| |
Collapse
|
17
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 PMCID: PMC11921810 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
18
|
He H, Leng Y, Cao X, Zhu Y, Li X, Yuan Q, Zhang B, He W, Wei H, Liu X, Xu Q, Guo M, Zhang H, Yang L, Lv Y, Wang X, Shi C, Zhang Z, Chen W, Zhang B, Wang T, Yu X, Qian H, Zhang Q, Dai X, Liu C, Cui Y, Wang Y, Zheng X, Xiong G, Zhou Y, Qian Q, Shang L. The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice. Nat Commun 2024; 15:7291. [PMID: 39181885 PMCID: PMC11344853 DOI: 10.1038/s41467-024-51854-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 08/20/2024] [Indexed: 08/27/2024] Open
Abstract
Tandem repeats (TRs) are genomic regions that tandemly change in repeat number, which are often multiallelic. Their characteristics and contributions to gene expression and quantitative traits in rice are largely unknown. Here, we survey rice TR variations based on 231 genome assemblies and the rice pan-genome graph. We identify 227,391 multiallelic TR loci, including 54,416 TR variations that are absent from the Nipponbare reference genome. Only 1/3 TR variations show strong linkage with nearby bi-allelic variants (SNPs, Indels and PAVs). Using 193 panicle and 202 leaf transcriptomic data, we reveal 485 and 511 TRs act as QTLs independently of other bi-allelic variations to nearby gene expression, respectively. Using plant height and grain width as examples, we identify and validate TRs contributions to rice agronomic trait variations. These findings would enhance our understanding of the functions of multiallelic variants and facilitate rice molecular breeding.
Collapse
Affiliation(s)
- Huiying He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yue Leng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xinglan Cao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, 475004, China
- Shenzhen Research Institute of Henan university, Shenzhen, 518000, China
| | - Yiwang Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Institute of Biotechnology, Fujian Academy of Agricultural Sciences/Fujian Provincial Key Laboratory of Genetic Engineering for Agriculture, Fuzhou, 350003, China
| | - Xiaoxia Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiaoling Yuan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bin Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Yazhouwan National Laboratory, Sanya, 572024, China
| | - Wenchuang He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hua Wei
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiangpei Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiang Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Mingliang Guo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hong Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Longbo Yang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yang Lv
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xianmeng Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chuanlin Shi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Zhipeng Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Wu Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bintao Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Tianyi Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaoman Yu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hongge Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qianqian Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaofan Dai
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Congcong Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yan Cui
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yuexing Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Xiaoming Zheng
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Guosheng Xiong
- Academy for Advanced Interdisciplinary Studies, Plant Phenomics Research Center, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Yongfeng Zhou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qian Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
- Yazhouwan National Laboratory, Sanya, 572024, China.
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Lianguang Shang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
- Yazhouwan National Laboratory, Sanya, 572024, China.
| |
Collapse
|
19
|
Loh CA, Shields DA, Schwing A, Evrony GD. High-fidelity, large-scale targeted profiling of microsatellites. Genome Res 2024; 34:1008-1026. [PMID: 39013593 PMCID: PMC11368184 DOI: 10.1101/gr.278785.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 07/11/2024] [Indexed: 07/18/2024]
Abstract
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.
Collapse
Affiliation(s)
- Caitlin A Loh
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Danielle A Shields
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Adam Schwing
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Gilad D Evrony
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA;
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| |
Collapse
|
20
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
21
|
Perry A, Eddelbuettel D, Rosenthal G, Blackmon H. Polly: An R package for genotyping microsatellites and detecting highly polymorphic DNA markers from short-read data. Mol Ecol Resour 2024; 24:e13933. [PMID: 38299378 PMCID: PMC10994724 DOI: 10.1111/1755-0998.13933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 01/10/2024] [Accepted: 01/23/2024] [Indexed: 02/02/2024]
Abstract
Highly polymorphic markers, such as microsatellites, are invaluable for the study of natural populations. However, contemporary methods for genotyping highly polymorphic variants have serious drawbacks that impede their efficiency. We created Polly, an R package with C++ source code that uses Illumina short-read data to genotype microsatellites, detect highly polymorphic variants and identify clusters of highly polymorphic SNPs, indels and microsatellites. We tested Polly on short-read data from Xiphophorus birchmanni (Teleostei: Poeciliidae) and Arabidopsis thaliana, finding it to be efficient and accurate both for microsatellite genotyping and polymorphic marker detection. This program can be applied to any diploid population for which there exists short-read data and at least one scaffolded reference genome.
Collapse
Affiliation(s)
- Annabel Perry
- Harvard University, Department of Human Evolutionary Biology
- Texas A&M University, Department of Biology
| | | | - Gil Rosenthal
- Texas A&M University, Department of Biology
- Università degli Studi di Padova, Dipartimento di Biologia
| | | |
Collapse
|
22
|
Oketch JW, Wain LV, Hollox EJ. A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples. PLoS One 2024; 19:e0300545. [PMID: 38558075 PMCID: PMC10984476 DOI: 10.1371/journal.pone.0300545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Short tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome and are highly polymorphic. Some cause Mendelian disease, and others affect gene expression. Their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data will help address this. Here, we compare software that genotypes common STRs and rarer STR expansions genome-wide, with the aim of applying them to population-scale genomes. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project short-read sequencing data, we compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure genotyping performance against a set of genomes from clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. We find that HipSTR, ExpansionHunter and GangSTR perform well in genotyping common STRs, including the CODIS 13 core STRs used for forensic analysis. GangSTR and ExpansionHunter outperform HipSTR for genotyping call rate and memory usage. ExpansionHunter denovo (EHdn), STRling and GangSTR outperformed STRetch for detecting expanded STRs, and EHdn and STRling used considerably less processor time compared to GangSTR. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.
Collapse
Affiliation(s)
- John W. Oketch
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Louise V. Wain
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- National Institute for Health Research, Leicester Respiratory Biomedical Research Centre, Glenfield Hospital, Leicester, United Kingdom
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
23
|
Lu J, Toro C, Adams DR, Moreno CAM, Lee WP, Leung YY, Harms MB, Vardarajan B, Heinzen EL. LUSTR: a new customizable tool for calling genome-wide germline and somatic short tandem repeat variants. BMC Genomics 2024; 25:115. [PMID: 38279154 PMCID: PMC10811831 DOI: 10.1186/s12864-023-09935-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 12/21/2023] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.
Collapse
Affiliation(s)
- Jinfeng Lu
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA.
| | - Camilo Toro
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | - David R Adams
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Mathew B Harms
- Department of Neurology, Division of Neuromuscular Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Badri Vardarajan
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA
| | - Erin L Heinzen
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
24
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
25
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
26
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
27
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
28
|
Mutti G, Oteo-Garcia G, Caldon M, da Silva MJF, Minhós T, Cowlishaw G, Gottelli D, Huchard E, Carter A, Martinez FI, Raveane A, Capelli C. Assessing the recovery of Y chromosome microsatellites with population genomic data using Papio and Theropithecus genomes. Sci Rep 2023; 13:13839. [PMID: 37620368 PMCID: PMC10449864 DOI: 10.1038/s41598-023-40931-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 08/18/2023] [Indexed: 08/26/2023] Open
Abstract
Y chromosome markers can shed light on male-specific population dynamics but for many species no such markers have been discovered and are available yet, despite the potential for recovering Y-linked loci from available genome sequences. Here, we investigated how effective available bioinformatic tools are in recovering informative Y chromosome microsatellites from whole genome sequence data. In order to do so, we initially explored a large dataset of whole genome sequences comprising individuals at various coverages belonging to different species of baboons (genus: Papio) using Y chromosome references belonging to the same genus and more distantly related species (Macaca mulatta). We then further tested this approach by recovering Y-STRs from available Theropithecus gelada genomes using Papio and Macaca Y chromosome as reference sequences. Identified loci were validated in silico by a) comparing within-species relationships of Y chromosome lineages and b) genotyping male individuals in available pedigrees. Each STR was selected not to extend in its variable region beyond 100 base pairs, so that loci can be developed for PCR-based genotyping of non-invasive DNA samples. In addition to assembling a first set of Papio and Theropithecus Y-specific microsatellite markers, we released TYpeSTeR, an easy-to-use script to identify and genotype Y chromosome STRs using population genomic data which can be modulated according to available male reference genomes and genomic data, making it widely applicable across taxa.
Collapse
Affiliation(s)
- Giacomo Mutti
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze, 11/a, 43124, Parma, Italy
- Barcelona Supercomputing Centre (BSC-CNS), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028, Barcelona, Spain
| | - Gonzalo Oteo-Garcia
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze, 11/a, 43124, Parma, Italy
| | - Matteo Caldon
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze, 11/a, 43124, Parma, Italy
| | - Maria Joana Ferreira da Silva
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal
- Centro de Investigação Em Biodiversidade E Recursos Genéticos, CIBIOInBIO Laboratório AssociadoUniversidade Do Porto, Campus de Vairão, Vairão, Portugal
- ONE ‑ Organisms and Environment Group, School of Biosciences, Cardiff University, Sir Martin Evans Building, Cardiff, UK
| | - Tânia Minhós
- Centre for Research in Anthropology (CRIA-NOVA FCSH), Av. Forças Armadas, Edifício ISCTE, Sala 2w2, 1649-026, Lisboa, Portugal
- Anthropology Department, School of Social Sciences and Humanities, Universidade Nova de Lisboa (NOVA FCSH), Av. de Berna, 26-C, 1069-061, Lisboa, Portugal
| | - Guy Cowlishaw
- Institute of Zoology, Zoological Society of London, Regent's Park, London, NW1 4RY, UK
| | - Dada Gottelli
- Institute of Zoology, Zoological Society of London, Regent's Park, London, NW1 4RY, UK
| | - Elise Huchard
- Institut Des Sciences de L'Evolution, CNRS, Universite de Montpellier, CC 065, 34095, Montpellier 05, France
| | - Alecia Carter
- Department of Anthropology, University College London, 14 Taviton Street, London, WC1H 0BW, UK
| | - Felipe I Martinez
- Escuela de Antropología, Facultad de Ciencias Sociales, Pontificia Universidad Católica de Chile, Santiago, Chile
| | | | - Cristian Capelli
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze, 11/a, 43124, Parma, Italy.
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK.
| |
Collapse
|
29
|
Wang X, Huang M, Budowle B, Ge J. TRcaller: a novel tool for precise and ultrafast tandem repeat variant genotyping in massively parallel sequencing reads. Front Genet 2023; 14:1227176. [PMID: 37533432 PMCID: PMC10390829 DOI: 10.3389/fgene.2023.1227176] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/13/2023] [Indexed: 08/04/2023] Open
Abstract
Calling tandem repeat (TR) variants from DNA sequences is of both theoretical and practical significance. Some bioinformatics tools have been developed for detecting or genotyping TRs. However, little study has been done to genotyping TR alleles from long-read sequencing data, and the accuracy of genotyping TR alleles from next-generation sequencing data still needs to be improved. Herein, a novel algorithm is described to retrieve TR regions from sequence alignment, and a software program TRcaller has been developed and integrated into a web portal to call TR alleles from both short- and long-read sequences, both whole genome and targeted sequences generated from multiple sequencing platforms. All TR alleles are genotyped as haplotypes and the robust alleles will be reported, even multiple alleles in a DNA mixture. TRcaller could provide substantially higher accuracy (>99% in 289 human individuals) in detecting TR alleles with magnitudes faster (e.g., ∼2 s for 300x human sequence data) than the mainstream software tools. The web portal preselected 119 TR loci from forensics, genealogy, and disease related TR loci. TRcaller is validated to be scalable in various applications, such as DNA forensics and disease diagnosis, which can be expanded into other fields like breeding programs. Availability: TRcaller is available at https://www.trcaller.com/SignIn.aspx.
Collapse
Affiliation(s)
- Xuewen Wang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Meng Huang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| |
Collapse
|
30
|
Kulthammanit N, Sathirapatya T, Sukawutthiya P, Noh H, Vongpaisarnsin K, Wichadakul D. STRategy: A support system for collecting and analyzing next-generation sequencing data of short tandem repeats for forensic science. PLoS One 2023; 18:e0282551. [PMID: 37459339 PMCID: PMC10351723 DOI: 10.1371/journal.pone.0282551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 05/30/2023] [Indexed: 07/20/2023] Open
Abstract
Short tandem repeats (STRs) are short repeated sequences commonly found in the human genome and valuable in forensic science, used for human identity and relatedness markers. Next-generation sequencing (NGS) technologies, e.g., ForenSeq Signature Prep, can sequence STRs, inferring length-based alleles and single nucleotide polymorphisms (SNPs) and providing valuable insights into population and sub-population structures. Despite the potential benefits of NGS for STRs, no open-source software platform integrates the collection, management, and analysis of STR data from NGS into one place. Users must use multiple programs to process their STR data and then collect the results into a separate database or a file system folder. Moreover, analyzing repeat structures (STR repeat motifs) may require learning multiple software tools, making the process inefficient and cumbersome. To address this gap, we introduce the STRategy, a standalone web-based application supporting essential STR data management and analysis capabilities. The STRategy allows users to collect their data into its database, automatically calculates forensic parameters, and visualizes the analyzed data in various forms. Users can search the database using different options, such as by profile, loci, and genotypes, with and without a specific test kit. Moreover, users can also find the nucleotide variants of a locus among the samples. We designed the STRategy for internal use in a laboratory or an organization. Hence, our system includes role-based access control that allows users to search for or access specific data based on their responsibilities. The administrator role can customize the system, for example, configure maps according to the samples' geographic data, and manage reference STR repeat motifs. A laboratory or an organization can download and install a copy of STRategy on their local system using Docker, as described in https://github.com/cucpbioinfo/STRategy. In summary, the STRategy is an end-to-end system that provides users with a database to collect the analyzed STR data from NGS, the dynamic analyses of forensic parameters, and the variants of STR patterns according to the newly added samples, which are then explorable via various search options and visualizations. The system is helpful for both forensic investigations and forensic genetics.
Collapse
Affiliation(s)
- Nuttachai Kulthammanit
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
| | - Tikumphorn Sathirapatya
- Department of Forensic Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Forensic Serology and DNA, King Chulalongkorn Memorial Hospital and Thai Red Cross Society, Bangkok, Thailand
| | - Poonyapat Sukawutthiya
- Department of Forensic Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Forensic Serology and DNA, King Chulalongkorn Memorial Hospital and Thai Red Cross Society, Bangkok, Thailand
| | - Hasnee Noh
- Department of Forensic Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Forensic Serology and DNA, King Chulalongkorn Memorial Hospital and Thai Red Cross Society, Bangkok, Thailand
| | - Kornkiat Vongpaisarnsin
- Department of Forensic Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Forensic Serology and DNA, King Chulalongkorn Memorial Hospital and Thai Red Cross Society, Bangkok, Thailand
- Forensic Genetics Research Unit, Ratchadapiseksompotch Fund, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Duangdao Wichadakul
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
31
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
32
|
Plavskin Y, de Biase MS, Schwarz RF, Siegal ML. The rate of spontaneous mutations in yeast deficient for MutSβ function. G3 (BETHESDA, MD.) 2023; 13:6931805. [PMID: 36529906 PMCID: PMC9997558 DOI: 10.1093/g3journal/jkac330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 08/25/2022] [Accepted: 11/30/2022] [Indexed: 12/23/2022]
Abstract
Mutations in simple sequence repeat loci underlie many inherited disorders in humans, and are increasingly recognized as important determinants of natural phenotypic variation. In eukaryotes, mutations in these sequences are primarily repaired by the MutSβ mismatch repair complex. To better understand the role of this complex in mismatch repair and the determinants of simple sequence repeat mutation predisposition, we performed mutation accumulation in yeast strains with abrogated MutSβ function. We demonstrate that mutations in simple sequence repeat loci in the absence of mismatch repair are primarily deletions. We also show that mutations accumulate at drastically different rates in short (<8 bp) and longer repeat loci. These data lend support to a model in which the mismatch repair complex is responsible for repair primarily in longer simple sequence repeats.
Collapse
Affiliation(s)
- Yevgeniy Plavskin
- Center for Genomics and Systems Biology, New York University, New York 10003, USA.,Department of Biology, New York University, New York 10003, USA
| | - Maria Stella de Biase
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany.,Department of Biology, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Roland F Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany.,Institute for Computational Cancer Biology, Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne 50937, Germany.,Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin 10623, Germany
| | - Mark L Siegal
- Center for Genomics and Systems Biology, New York University, New York 10003, USA.,Department of Biology, New York University, New York 10003, USA
| |
Collapse
|
33
|
Microsatellite Genome-Wide Database Development for the Commercial Blackhead Seabream (Acanthopagrus schlegelii). Genes (Basel) 2023; 14:genes14030620. [PMID: 36980892 PMCID: PMC10048070 DOI: 10.3390/genes14030620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/26/2023] [Accepted: 02/27/2023] [Indexed: 03/05/2023] Open
Abstract
Simple sequence repeats (SSRs), the markers with the highest polymorphism and co-dominance degrees, offer a crucial genetic research resource. Limited SSR markers in blackhead seabream have been reported. The availability of the blackhead seabream genome assembly provided the opportunity to carry out genome-wide identification for all microsatellite markers, and bioinformatic analyses open the way for developing a microsatellite genome-wide database in blackhead seabream. In this study, a total of 412,381 SSRs were identified in the 688.08 Mb genome by Krait software. Whole-genome sequences (10×) of 42 samples were aligned against the reference genome and genotyped using the HipSTR tools by comparing and counting repeat number variation across the SSR loci. A total of 156,086 SSRs with a 2–4 bp repeat were genotyped by HipSTR tools, which accounted for 55.78% of the 2–4 bp SSRs in the reference genome. High accuracy of genotyping was observed by comparing HipSTR tools and PCR amplification. A set of 109,131 loci with a number of alleles ≥ 3 and with a number of genotyped individuals ≥ 6 were reserved to constitute the polymorphic SSR database. Fifty-one polymorphic SSR loci were identified through PCR amplification. This strategy to develop polymorphic SSR markers not only obtained a large set of polymorphic SSRs but also eliminated the need for laborious experimental screening. SSR markers developed in this study may facilitate blackhead seabream research, which lays a certain foundation for further gene tagging and genetic linkage analysis, such as marker-assisted selection, genetic mapping, as well as comparative genomic analysis.
Collapse
|
34
|
Dashnow H, Pedersen BS, Hiatt L, Brown J, Beecroft SJ, Ravenscroft G, LaCroix AJ, Lamont P, Roxburgh RH, Rodrigues MJ, Davis M, Mefford HC, Laing NG, Quinlan AR. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol 2022; 23:257. [PMID: 36517892 PMCID: PMC9753380 DOI: 10.1186/s13059-022-02826-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 11/30/2022] [Indexed: 12/23/2022] Open
Abstract
Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .
Collapse
Affiliation(s)
- Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utrecht University Medical Center, Utrecht, The Netherlands
| | - Laurel Hiatt
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Joe Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Sarah J Beecroft
- Pawsey Supercomputing Research Centre, Kensington, WA, Australia
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
| | - Gianina Ravenscroft
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
| | - Amy J LaCroix
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Phillipa Lamont
- Neurogenetic Unit, Royal Perth Hospital, Perth, WA, Australia
| | | | - Miriam J Rodrigues
- Neurology, Auckland City Hospital, Auckland, New Zealand
- Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Mark Davis
- Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Heather C Mefford
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Nigel G Laing
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
- Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
35
|
Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol 2022; 23:253. [PMID: 36510265 PMCID: PMC9743774 DOI: 10.1186/s13059-022-02818-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 11/17/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. To estimate the genome-wide pattern of mutations at STR loci, we analyze blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. RESULTS We use HipSTR to identify de novo STR mutations in the 2nd generation of these pedigrees and require transmission to the third generation for validation. Analyzing approximately 1.6 million STR loci, we estimate the empirical de novo STR mutation rate to be 5.24 × 10-5 mutations per locus per generation. Perfect repeats mutate about 2 × more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements. CONCLUSIONS Approximately 30% of new STR mutations occur within Alu elements, which compose only 11% of the genome, but only 10% are found in LINE-1 insertions, which compose 17% of the genome. Phasing these mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be approximately 85, which is similar to the average number of observed de novo single nucleotide variants.
Collapse
Affiliation(s)
- Cody J. Steely
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - W. Scott Watkins
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lisa Baird
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lynn B. Jorde
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| |
Collapse
|
36
|
Frontanilla TS, Valle-Silva G, Ayala J, Mendes-Junior CT. Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project. Genes (Basel) 2022; 13:genes13122205. [PMID: 36553472 PMCID: PMC9778533 DOI: 10.3390/genes13122205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/13/2022] [Accepted: 11/21/2022] [Indexed: 11/27/2022] Open
Abstract
Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.
Collapse
Affiliation(s)
- Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14049-900, SP, Brazil
| | - Guilherme Valle-Silva
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, Brazil
| | - Jesus Ayala
- Facultad de Ingeniería Informática, Universidad de la Integración de las Americas, Asunción 00120-6, Paraguay
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, Brazil
- Correspondence:
| |
Collapse
|
37
|
Assessing Sequence Variation and Genetic Diversity of Currently Untapped Y-STR Loci. FORENSIC SCIENCE INTERNATIONAL: REPORTS 2022. [DOI: 10.1016/j.fsir.2022.100298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
38
|
Comparative analysis of microsatellites in coding regions provides insights into the adaptability of the giant panda, polar bear and brown bear. Genetica 2022; 150:355-366. [DOI: 10.1007/s10709-022-00173-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 09/13/2022] [Indexed: 11/27/2022]
|
39
|
Wang Z, Moffitt AB, Andrews P, Wigler M, Levy D. Accurate measurement of microsatellite length by disrupting its tandem repeat structure. Nucleic Acids Res 2022; 50:e116. [PMID: 36095132 PMCID: PMC9723644 DOI: 10.1093/nar/gkac723] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/03/2022] [Accepted: 08/15/2022] [Indexed: 12/24/2022] Open
Abstract
Tandem repeats of simple sequence motifs, also known as microsatellites, are abundant in the genome. Because their repeat structure makes replication error-prone, variant microsatellite lengths are often generated during germline and other somatic expansions. As such, microsatellite length variations can serve as markers for cancer. However, accurate error-free measurement of microsatellite lengths is difficult with current methods precisely because of this high error rate during amplification. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure of initial templates so that their sequence lengths replicate faithfully. In this work, we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring agreement from two independent first copies of an initial template, we reach error rates below one in a million. We apply this method to a thousand microsatellite loci from the human genome, revealing microsatellite length distributions not observable without mutagenesis.
Collapse
Affiliation(s)
| | | | - Peter Andrews
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Dan Levy
- To whom correspondence should be addressed. Tel: +1 516 367 5039; Fax: +1 516 367 8381;
| |
Collapse
|
40
|
Quinones-Valdez G, Fu T, Chan TW, Xiao X. scAllele: A versatile tool for the detection and analysis of variants in scRNA-seq. SCIENCE ADVANCES 2022; 8:eabn6398. [PMID: 36054357 PMCID: PMC11636672 DOI: 10.1126/sciadv.abn6398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 07/19/2022] [Indexed: 05/12/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) data contain rich information at the gene, transcript, and nucleotide levels. Most analyses of scRNA-seq have focused on gene expression profiles, and it remains challenging to extract nucleotide variants and isoform-specific information. Here, we present scAllele, an integrative approach that detects single-nucleotide variants, insertions, deletions, and their allelic linkage with splicing patterns in scRNA-seq. We demonstrate that scAllele achieves better performance in identifying nucleotide variants than other commonly used tools. In addition, the read-specific variant calls by scAllele enables allele-specific splicing analysis, a unique feature not afforded by other methods. Applied to a lung cancer scRNA-seq dataset, scAllele identified variants with strong allelic linkage to alternative splicing, some of which are cancer specific and enriched in cancer-relevant pathways. scAllele represents a versatile tool to uncover multilayer information and previously unidentified biological insights from scRNA-seq data.
Collapse
Affiliation(s)
| | - Ting Fu
- Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Tracey W. Chan
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
41
|
Fang L, Liu Q, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol 2022; 23:108. [PMID: 35484600 PMCID: PMC9052667 DOI: 10.1186/s13059-022-02670-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/08/2022] [Indexed: 12/12/2022] Open
Abstract
Despite recent improvements in basecalling accuracy, nanopore sequencing still has higher error rates on short-tandem repeats (STRs). Instead of using basecalled reads, we developed DeepRepeat which converts ionic current signals into red-green-blue channels, thus transforming the repeat detection problem into an image recognition problem. DeepRepeat identifies and accurately quantifies telomeric repeats in the CHM13 cell line and achieves higher accuracy in quantifying repeats in long STRs than competing methods. We also evaluate DeepRepeat on genome-wide or candidate region datasets from seven different sources. In summary, DeepRepeat enables accurate quantification of long STRs and complements existing methods relying on basecalled reads.
Collapse
Affiliation(s)
- Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,School of Life Sciences, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA. .,Nevada Institute of Personalized Medicine, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA.
| | - Alex Mas Monteys
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Pedro Gonzalez-Alegre
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Beverly L Davidson
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
42
|
Wang H, Gao S, Liu Y, Wang P, Zhang Z, Chen D. A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi-sample genomic data. Ecol Evol 2022; 12:e8705. [PMID: 35342577 PMCID: PMC8928897 DOI: 10.1002/ece3.8705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 01/25/2022] [Accepted: 02/15/2022] [Indexed: 01/24/2023] Open
Abstract
Simple sequence repeats (SSRs) are widely used genetic markers in ecology, evolution, and conservation even in the genomics era, while a general limitation to their application is the difficulty of developing polymorphic SSR markers. Next-generation sequencing (NGS) offers the opportunity for the rapid development of SSRs; however, previous studies developing SSRs using genomic data from only one individual need redundant experiments to test the polymorphisms of SSRs. In this study, we designed a pipeline for the rapid development of polymorphic SSR markers from multi-sample genomic data. We used bioinformatic software to genotype multiple individuals using resequencing data, detected highly polymorphic SSRs prior to experimental validation, significantly improved the efficiency and reduced the experimental effort. The pipeline was successfully applied to a globally threatened species, the brown eared-pheasant (Crossoptilon mantchuricum), which showed very low genomic diversity. The 20 newly developed SSR markers were highly polymorphic, the average number of alleles was much higher than the genomic average. We also evaluated the effect of the number of individuals and sequencing depth on the SSR mining results, and we found that 10 individuals and ~10X sequencing data were enough to obtain a sufficient number of polymorphic SSRs, even for species with low genetic diversity. Furthermore, the genome assembly of NGS data from the optimal number of individuals and sequencing depth can be used as an alternative reference genome if a high-quality genome is not available. Our pipeline provided a paradigm for the application of NGS technology to mining and developing molecular markers for ecological and evolutionary studies.
Collapse
Affiliation(s)
- Hui Wang
- MOE Key Laboratory for Biodiversity Science and Ecological EngineeringCollege of Life SciencesBeijing Normal UniversityBeijingChina
| | - Shenghan Gao
- State Key Laboratory of Microbial ResourcesInstitute of MicrobiologyChinese Academy of SciencesBeijingChina
| | - Yu Liu
- MOE Key Laboratory for Biodiversity Science and Ecological EngineeringCollege of Life SciencesBeijing Normal UniversityBeijingChina
| | - Pengcheng Wang
- Jiangsu Key Laboratory for Biodiversity and BiotechnologyCollege of Life SciencesNanjing Normal UniversityNanjingChina
| | - Zhengwang Zhang
- MOE Key Laboratory for Biodiversity Science and Ecological EngineeringCollege of Life SciencesBeijing Normal UniversityBeijingChina
| | - De Chen
- MOE Key Laboratory for Biodiversity Science and Ecological EngineeringCollege of Life SciencesBeijing Normal UniversityBeijingChina
| |
Collapse
|
43
|
Analysis and comparison of the STR genotypes called with HipSTR, STRait Razor and toaSTR by using next generation sequencing data in a Brazilian population sample. Forensic Sci Int Genet 2022; 58:102676. [DOI: 10.1016/j.fsigen.2022.102676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 01/18/2022] [Accepted: 02/02/2022] [Indexed: 12/31/2022]
|
44
|
Chen J, Li F, Wang M, Li J, Marquez-Lago TT, Leier A, Revote J, Li S, Liu Q, Song J. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data. Front Big Data 2022; 4:727216. [PMID: 35118375 PMCID: PMC8805145 DOI: 10.3389/fdata.2021.727216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 12/13/2021] [Indexed: 11/22/2022] Open
Abstract
Background Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identify SSRs. However, the sequenced genome often misses several highly repetitive regions. Moreover, many non-model species have no entire genomes. With the recent advances of next-generation sequencing (NGS) techniques, large-scale sequence reads for any species can be rapidly generated using NGS. In this context, a number of methods have been proposed to identify thousands of SSR loci within large amounts of reads for non-model species. While the most commonly used NGS platforms (e.g., Illumina platform) on the market generally provide short paired-end reads, merging overlapping paired-end reads has become a common way prior to the identification of SSR loci. This has posed a big data analysis challenge for traditional stand-alone tools to merge short read pairs and identify SSRs from large-scale data. Results In this study, we present a new Hadoop-based software program, termed BigFiRSt, to address this problem using cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. BigFLASH and BigPERF address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can dramatically reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data. Conclusions The excellent performance of BigFiRSt mainly resorts to the Big Data Hadoop technology to merge read pairs and mine SSRs in parallel and distributed computing on clusters. We anticipate BigFiRSt will be a valuable tool in the coming biological Big Data era.
Collapse
Affiliation(s)
- Jinxiang Chen
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia
- Department of Microbiology and Immunity, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia
| | - Miao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Junlong Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Tatiana T. Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jerico Revote
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
| | - Shuqin Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
- Quanzhong Liu
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia
- *Correspondence: Jiangning Song
| |
Collapse
|
45
|
Han J, Munro JE, Kocoski A, Barry AE, Bahlo M. Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species. PLoS Genet 2022; 18:e1009604. [PMID: 35007277 PMCID: PMC8782505 DOI: 10.1371/journal.pgen.1009604] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 01/21/2022] [Accepted: 12/14/2021] [Indexed: 11/18/2022] Open
Abstract
Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).
Collapse
Affiliation(s)
- Jiru Han
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Jacob E. Munro
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Anthony Kocoski
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Alyssa E. Barry
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- Disease Elimination Program, Burnet Institute, Melbourne, Australia
- IMPACT Institute for Innovation in Mental and Physical Health and Clinical Translation, Deakin University, Geelong, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- * E-mail:
| |
Collapse
|
46
|
Gall-Duncan T, Sato N, Yuen RKC, Pearson CE. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res 2022; 32:1-27. [PMID: 34965938 PMCID: PMC8744678 DOI: 10.1101/gr.269530.120] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/29/2021] [Indexed: 11/25/2022]
Abstract
Expansions of gene-specific DNA tandem repeats (TRs), first described in 1991 as a disease-causing mutation in humans, are now known to cause >60 phenotypes, not just disease, and not only in humans. TRs are a common form of genetic variation with biological consequences, observed, so far, in humans, dogs, plants, oysters, and yeast. Repeat diseases show atypical clinical features, genetic anticipation, and multiple and partially penetrant phenotypes among family members. Discovery of disease-causing repeat expansion loci accelerated through technological advances in DNA sequencing and computational analyses. Between 2019 and 2021, 17 new disease-causing TR expansions were reported, totaling 63 TR loci (>69 diseases), with a likelihood of more discoveries, and in more organisms. Recent and historical lessons reveal that properly assessed clinical presentations, coupled with genetic and biological awareness, can guide discovery of disease-causing unstable TRs. We highlight critical but underrecognized aspects of TR mutations. Repeat motifs may not be present in current reference genomes but will be in forthcoming gapless long-read references. Repeat motif size can be a single nucleotide to kilobases/unit. At a given locus, repeat motif sequence purity can vary with consequence. Pathogenic repeats can be "insertions" within nonpathogenic TRs. Expansions, contractions, and somatic length variations of TRs can have clinical/biological consequences. TR instabilities occur in humans and other organisms. TRs can be epigenetically modified and/or chromosomal fragile sites. We discuss the expanding field of disease-associated TR instabilities, highlighting prospects, clinical and genetic clues, tools, and challenges for further discoveries of disease-causing TR instabilities and understanding their biological and pathological impacts-a vista that is about to expand.
Collapse
Affiliation(s)
- Terence Gall-Duncan
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Nozomu Sato
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
| | - Ryan K C Yuen
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
47
|
Rajabi F, Jabalameli N, Rezaei N. The Concept of Immunogenetics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1367:1-17. [DOI: 10.1007/978-3-030-92616-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
48
|
Schröder C, Horsthemke B, Depienne C. GC-rich repeat expansions: associated disorders and mechanisms. MED GENET-BERLIN 2021; 33:325-335. [PMID: 38835438 PMCID: PMC11006399 DOI: 10.1515/medgen-2021-2099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 11/12/2021] [Indexed: 06/06/2024]
Abstract
Noncoding repeat expansions are a well-known cause of genetic disorders mainly affecting the central nervous system. Missed by most standard technologies used in routine diagnosis, pathogenic noncoding repeat expansions have to be searched for using specific techniques such as repeat-primed PCR or specific bioinformatics tools applied to genome data, such as ExpansionHunter. In this review, we focus on GC-rich repeat expansions, which represent at least one third of all noncoding repeat expansions described so far. GC-rich expansions are mainly located in regulatory regions (promoter, 5' untranslated region, first intron) of genes and can lead to either a toxic gain-of-function mediated by RNA toxicity and/or repeat-associated non-AUG (RAN) translation, or a loss-of-function of the associated gene, depending on their size and their methylation status. We herein review the clinical and molecular characteristics of disorders associated with these difficult-to-detect expansions.
Collapse
Affiliation(s)
- Christopher Schröder
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Bernhard Horsthemke
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Christel Depienne
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
49
|
Yang Q, Qian J, Shao C, Yao Y, Zhou Z, Xu H, Tang Q, Qian X, Xie J. Identification and Characterization of Nine Novel X-Chromosomal Short Tandem Repeats on Xp21.1, Xq21.31, and Xq23 Regions. Front Genet 2021; 12:784605. [PMID: 34868274 PMCID: PMC8635773 DOI: 10.3389/fgene.2021.784605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 10/20/2021] [Indexed: 11/13/2022] Open
Abstract
The application of X-chromosomal short tandem repeats (X-STRs) has been recognized as a powerful tool in complex kinship testing. To support further development of X-STR analysis in forensic use, we identified nine novel X-STRs, which could be clustered into three linkage groups on Xp21.1, Xq21.31, and Xq23. A multiplex PCR system was built based on the electrophoresis. A total of 198 unrelated Shanghai Han samples along with 168 samples from 43 families was collected to investigate the genetic polymorphism and forensic parameters of the nine loci. Allele numbers ranged from 5 to 12, and amplicon sizes ranged from 146 to 477 bp. The multiplex showed high values for the combined power of discrimination (0.99997977 in males and 0.99999999 in females) and combined mean exclusion chances (0.99997918 and 0.99997821 in trios, 0.99984939 in duos, and 0.99984200 in deficiency cases). The linkage between all pairs of loci was estimated via Kosambi mapping function and linkage disequilibrium test, and further investigated through the family study. The data from 43 families strongly demonstrated an independent transmission between LGs and a tight linkage among loci within the same LG. All these results support that the newly described X-STRs and the multiplex system are highly promising for further forensic use.
Collapse
Affiliation(s)
- Qinrui Yang
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jinglei Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Chengchen Shao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Hongmei Xu
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Xiaoqin Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
50
|
An Introductory Overview of Open-Source and Commercial Software Options for the Analysis of Forensic Sequencing Data. Genes (Basel) 2021; 12:genes12111739. [PMID: 34828345 PMCID: PMC8618049 DOI: 10.3390/genes12111739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/30/2022] Open
Abstract
The top challenges of adopting new methods to forensic DNA analysis in routine laboratories are often the capital investment and the expertise required to implement and validate such methods locally. In the case of next-generation sequencing, in the last decade, several specifically forensic commercial options became available, offering reliable and validated solutions. Despite this, the readily available expertise to analyze, interpret and understand such data is still perceived to be lagging behind. This review gives an introductory overview for the forensic scientists who are at the beginning of their journey with implementing next-generation sequencing locally and because most in the field do not have a bioinformatics background may find it difficult to navigate the new terms and analysis options available. The currently available open-source and commercial software for forensic sequencing data analysis are summarized here to provide an accessible starting point for those fairly new to the forensic application of massively parallel sequencing.
Collapse
|