1
|
English AC, Dolzhenko E, Ziaei Jam H, McKenzie SK, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol 2024:10.1038/s41587-024-02225-z. [PMID: 38671154 DOI: 10.1038/s41587-024-02225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/28/2024] [Indexed: 04/28/2024]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | | | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | | | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
2
|
Oketch JW, Wain LV, Hollox EJ. A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples. PLoS One 2024; 19:e0300545. [PMID: 38558075 PMCID: PMC10984476 DOI: 10.1371/journal.pone.0300545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Short tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome and are highly polymorphic. Some cause Mendelian disease, and others affect gene expression. Their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data will help address this. Here, we compare software that genotypes common STRs and rarer STR expansions genome-wide, with the aim of applying them to population-scale genomes. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project short-read sequencing data, we compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure genotyping performance against a set of genomes from clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. We find that HipSTR, ExpansionHunter and GangSTR perform well in genotyping common STRs, including the CODIS 13 core STRs used for forensic analysis. GangSTR and ExpansionHunter outperform HipSTR for genotyping call rate and memory usage. ExpansionHunter denovo (EHdn), STRling and GangSTR outperformed STRetch for detecting expanded STRs, and EHdn and STRling used considerably less processor time compared to GangSTR. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.
Collapse
Affiliation(s)
- John W. Oketch
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Louise V. Wain
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- National Institute for Health Research, Leicester Respiratory Biomedical Research Centre, Glenfield Hospital, Leicester, United Kingdom
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
3
|
Rajan-Babu IS, Dolzhenko E, Eberle MA, Friedman JM. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 2024:10.1038/s41576-024-00696-z. [PMID: 38467784 DOI: 10.1038/s41576-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/13/2024]
Abstract
Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities. Here, we review the diverse structural conformations of repeat expansions, technological advances for the characterization of changes in sequence composition, their clinical correlations and the impact on disease mechanisms.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada.
| | | | | | - Jan M Friedman
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada
- BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
4
|
McComish BJ, Charleston MA, Parks M, Baroni C, Salvatore MC, Li R, Zhang G, Millar CD, Holland BR, Lambert DM. Ancient and Modern Genomes Reveal Microsatellites Maintain a Dynamic Equilibrium Through Deep Time. Genome Biol Evol 2024; 16:evae017. [PMID: 38412309 PMCID: PMC10972684 DOI: 10.1093/gbe/evae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 12/22/2023] [Accepted: 01/23/2024] [Indexed: 02/29/2024] Open
Abstract
Microsatellites are widely used in population genetics, but their evolutionary dynamics remain poorly understood. It is unclear whether microsatellite loci drift in length over time. This is important because the mutation processes that underlie these important genetic markers are central to the evolutionary models that employ microsatellites. We identify more than 27 million microsatellites using a novel and unique dataset of modern and ancient Adélie penguin genomes along with data from 63 published chordate genomes. We investigate microsatellite evolutionary dynamics over 2 timescales: one based on Adélie penguin samples dating to ∼46.5 ka and the other dating to the diversification of chordates aged more than 500 Ma. We show that the process of microsatellite allele length evolution is at dynamic equilibrium; while there is length polymorphism among individuals, the length distribution for a given locus remains stable. Many microsatellites persist over very long timescales, particularly in exons and regulatory sequences. These often retain length variability, suggesting that they may play a role in maintaining phenotypic variation within populations.
Collapse
Affiliation(s)
- Bennet J McComish
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7001, Australia
| | | | - Matthew Parks
- Australian Research Centre for Human Evolution, Griffith University, Nathan, QLD 4111, Australia
- Department of Biology, University of Central Oklahoma, Edmond, OK 73034, USA
| | - Carlo Baroni
- Dipartimento di Scienze della Terra, University of Pisa, Pisa, Italy
- CNR-IGG, Institute of Geosciences and Earth Resources, Pisa, Italy
| | - Maria Cristina Salvatore
- Dipartimento di Scienze della Terra, University of Pisa, Pisa, Italy
- CNR-IGG, Institute of Geosciences and Earth Resources, Pisa, Italy
| | - Ruiqiang Li
- Novogene Bioinformatics Technology Co. Ltd., Beijing 100083, China
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
- Department of Biology, Centre for Social Evolution, University of Copenhagen, Copenhagen DK-2100, Denmark
| | - Craig D Millar
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| | - David M Lambert
- Australian Research Centre for Human Evolution, Griffith University, Nathan, QLD 4111, Australia
| |
Collapse
|
5
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024:10.1038/s41576-024-00692-3. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
6
|
Verbiest MA, Lundström O, Xia F, Baudis M, Bilgin Sonay T, Anisimova M. Short tandem repeat mutations regulate gene expression in colorectal cancer. Sci Rep 2024; 14:3331. [PMID: 38336885 PMCID: PMC10858039 DOI: 10.1038/s41598-024-53739-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/04/2024] [Indexed: 02/12/2024] Open
Abstract
Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression data to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. Future extensions of these findings could uncover new STR-based targets in the treatment of cancer.
Collapse
Affiliation(s)
- Max A Verbiest
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Oxana Lundström
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Feifei Xia
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tugce Bilgin Sonay
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Ecology, Evolution and Environmental Biology, Columbia University, New York, USA
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
7
|
Alinaghi S, Mohseni M, Fattahi Z, Beheshtian M, Ghodratpour F, Zare Ashrafi F, Arzhangi S, Jalalvand K, Najafipour R, Khorram Khorshid HR, Kahrizi K, Najmabadi H. Genetic Analysis of 27 Y-STR Haplotypes in 11 Iranian Ethnic Groups. ARCHIVES OF IRANIAN MEDICINE 2024; 27:79-88. [PMID: 38619031 PMCID: PMC11017261 DOI: 10.34172/aim.2024.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 01/23/2024] [Indexed: 04/16/2024]
Abstract
BACKGROUND The study of Y-chromosomal variations provides valuable insights into male susceptibility in certain diseases like cardiovascular disease (CVD). In this study, we analyzed paternal lineage in different Iranian ethnic groups, not only to identify developing medical etiology, but also to pave the way for gender-specific targeted strategies and personalized medicine in medical genetic research studies. METHODS The diversity of eleven Iranian ethnic groups was studied using 27 Y-chromosomal short tandem repeat (Y-STR) haplotypes from Y-filer® Plus kit. Analysis of molecular variance (AMOVA) based on pair-wise RST along with multidimensional scaling (MDS) calculation and Network phylogenic analysis was employed to quantify the differences between 503 unrelated individuals from each ethnicity. RESULTS Results from AMOVA calculation confirmed that Gilaks and Azeris showed the largest genetic distance (RST=0.35434); however, Sistanis and Lurs had the smallest considerable genetic distance (RST=0.00483) compared to other ethnicities. Although Azeris had a considerable distance from other ethnicities, they were still close to Turkmens. MDS analysis of ethnic groups gave the indication of lack of similarity between different ethnicities. Besides, network phylogenic analysis demonstrated insignificant clustering between samples. CONCLUSION The AMOVA analysis results explain that the close distance of Azeris and Turkmens may be the effect of male-dominant expansions across Central Asia that contributed to historical and demographics of populations in the region. Insignificant differences in network analysis could be the consequence of high mutation events that happened in the Y-STR regions over the years. Considering the ethnic group affiliations in medical research, our results provided an understanding and characterization of Iranian male population for future medical and population genetics studies.
Collapse
Affiliation(s)
- Somayeh Alinaghi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Marzieh Mohseni
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Zohreh Fattahi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Maryam Beheshtian
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Fatemeh Ghodratpour
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Farzane Zare Ashrafi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Sanaz Arzhangi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Khadijeh Jalalvand
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Reza Najafipour
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | | | - Kimia Kahrizi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Hossein Najmabadi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| |
Collapse
|
8
|
Parikh K, Quintero Reis A, Wendt FR. Association between suicidal ideation and tandem repeats in contactins. Front Psychiatry 2024; 14:1236540. [PMID: 38239902 PMCID: PMC10794671 DOI: 10.3389/fpsyt.2023.1236540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 12/13/2023] [Indexed: 01/22/2024] Open
Abstract
Background Death by suicide is one of the leading causes of death among adolescents. Genome-wide association studies (GWAS) have identified loci that associate with suicidal ideation and related behaviours. One such group of loci are the six contactin genes (CNTN1-6) that are critical to neurodevelopment through regulating neurite structure. Because single nucleotide polymorphisms (SNPs) detected by GWAS often map to non-coding intergenic regions, we investigated whether repetitive variants in CNTNs associated with suicidality in a young cohort aged 8 to 21. Understanding the genetic liability of suicidal thought and behavior in this age group will promote early intervention and treatment. Methods Genotypic and phenotypic data were obtained from the Philadelphia Neurodevelopment Cohort (PNC). Across six CNTNs, 232 short tandem repeats (STRs) were analyzed in up to 4,595 individuals of European ancestry who expressed current, previous, or no suicidal ideation. STRs were imputed into SNP arrays using a phased SNP-STR haplotype reference panel from the 1000 Genomes Project. We tested several additive and interactive models of locus-level burden (i.e., sum of STR alleles) with respect to suicidal ideation. Additive models included sex, birth year, developmental stage ("DevStage"), and the first 10 principal components of ancestry as covariates; interactive models assessed the effect of STR-by-DevStage considering all other covariates. Results CNTN1-[T]N interacted with DevStage to increase risk for current suicidal ideation (CNTN1-[T]N-by-DevStage; p = 0.00035). Compared to the youngest age group, the middle (OR = 1.80, p = 0.0514) and oldest (OR = 3.82, p = 0.0002) participant groups had significantly higher odds of suicidal ideation as their STR length expanded; this result was independent of polygenic scores for suicidal ideation. Discussion These findings highlight diversity in the genetic effects (i.e., SNP and STR) acting on suicidal thoughts and behavior and advance our understanding of suicidal ideation across childhood and adolescence.
Collapse
Affiliation(s)
- Kairavi Parikh
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| | - Andrea Quintero Reis
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R. Wendt
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
9
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
10
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
11
|
Guo MH, Lee WP, Vardarajan B, Schellenberg GD, Phillips-Cremins J. Polygenic burden of short tandem repeat expansions promote risk for Alzheimer's disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.16.23298623. [PMID: 38014121 PMCID: PMC10680900 DOI: 10.1101/2023.11.16.23298623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed. As pathogenic expansions of STR cause over 30 neurologic diseases, it is important to ascertain whether STRs may also be implicated in AD risk. Here, we genotyped 321,742 polymorphic STR tracts genome-wide using PCR-free whole genome sequencing data from 2,981 individuals (1,489 AD case and 1,492 control individuals). We implemented an approach to identify STR expansions as STRs with tract lengths that are outliers from the population. We then tested for differences in aggregate burden of expansions in case versus control individuals. AD patients had a 1.19-fold increase of STR expansions compared to healthy elderly controls (p=8.27×10-3, two-sided Mann Whitney test). Individuals carrying > 30 STR expansions had 3.62-fold higher odds of having AD and had more severe AD neuropathology. AD STR expansions were highly enriched within active promoters in post-mortem hippocampal brain tissues and particularly within SINE-VNTR-Alu (SVA) retrotransposons. Together, these results demonstrate that expanded STRs within active promoter regions of the genome promote risk of AD.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Badri Vardarajan
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Jennifer Phillips-Cremins
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
12
|
Zambrano AK, Cadena-Ullauri S, Guevara-Ramírez P, Paz-Cruz E, Tamayo-Trujillo R, Ruiz-Pozo VA, Doménech N, Ibarra-Rodríguez AA, Gaviria A. The Autosomal Short Tandem Repeat Polymorphisms Are Potentially Associated with Cardiovascular Disease Predisposition in the Latin American Population: A Mini Review. BIOMED RESEARCH INTERNATIONAL 2023; 2023:6152905. [PMID: 38027043 PMCID: PMC10651335 DOI: 10.1155/2023/6152905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 10/02/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023]
Abstract
According to the World Health Organization, cardiovascular diseases (CVDs) are the leading cause of death worldwide across nearly all ethnic groups. Inherited cardiac conditions comprise a wide spectrum of diseases that affect the heart, including abnormal structural features and functional impairments. In Latin America, CVDs are the leading cause of death within the region. Factors such as population aging, unhealthy diet, obesity, smoking, and a sedentary lifestyle have increased the risk of CVD. The Latin American population is characterized by its diverse ethnic composition with varying percentages of each ancestral component (African, European, and Native American ancestry). Short tandem repeats (STRs) are DNA sequences with 2-6 base pair repetitions and constitute ~3% of the human genome. Importantly, significant allele frequency variations exist between different populations. While studies have described that STRs are in noncoding regions of the DNA, increasing evidence suggests that simple sequence repeat variations may be critical for proper gene activity and regulation. Furthermore, several STRs have been identified as potential disease predisposition markers. The present review is aimed at comparing and describing the frequencies of autosomal STR polymorphisms potentially associated with cardiovascular disease predisposition in Latin America compared with other populations.
Collapse
Affiliation(s)
- Ana Karina Zambrano
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Santiago Cadena-Ullauri
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Patricia Guevara-Ramírez
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Elius Paz-Cruz
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Rafael Tamayo-Trujillo
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Viviana A. Ruiz-Pozo
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Nieves Doménech
- Instituto de Investigación Biomédica de A Coruña (INIBIC)-CIBERCV, Complexo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidad da Coruña (UDC), La Coruña, Spain
| | | | - Aníbal Gaviria
- Laboratorio de Genética Molecular, Centros Médicos Especializados Cruz Roja Ecuatoriana, Quito, Ecuador
- Hemocentro Nacional, Cruz Roja Ecuatoriana, Quito, Ecuador
| |
Collapse
|
13
|
Zhang X, Zhang Y, Wang L, Wu G, Pan C. Three novel simple sequence repeats (SSRs) identified by MALDI-TOF-MS method were associated with backfat in pig. Anim Biotechnol 2023; 34:1014-1021. [PMID: 35048796 DOI: 10.1080/10495398.2021.2009845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Backfat trait is an important economic trait and highly heritable, but difficult to evaluate. Thus, it is of great significance to explore optimal backfat thickness of pigs by using marker-assisted selection (MAS) to speed up its breeding process and improve economic efficiency. This study aimed to investigate the relationship between genetic variations (e.g., SSRs) and backfat of Qinghai Bamei pigs using MALDI-TOF Mass Spectrometry (MALDI-TOF-MS). Herein, five alternative SSR loci (namely V1, V2, V3, V4 and V5) were selected for subsequent detection. The results suggested that 3 (141-, 143- and 145-), 3 (128-, 130- and 132-), 2 (160- and 162-), 2 (136- and 139-) and 3 (170-, 184- and 192-) alleles of V1, V2, V3, V4 and V5 were found, respectively. Subsequent analysis showed that there was linkage equilibrium among five SSRs and Hap19 (13.1%) (141-/132-/160-/139-/192-) had the highest haplotype frequency. Among these five SSR loci, V1, V2 and V3 loci were significantly associated to the backfat of Qinghai Bamei sows. These findings enriched the study of SSRs in Qinghai Bamei pigs, and (AC)n (Chr15:85485851-85485995), (AC)n (Chr10:52724583-52724713) and (TG)n (Chr4:90732644-90732802) could be utilized as the candidate locus for MAS in pig industry.HIGHLIGHTSFive novel SSR loci was identified in pigs through MALDI-TOF MS.V1, V2 and V3 loci was were significantly associated to the backfat of pigs.
Collapse
Affiliation(s)
- Xuelian Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yanghai Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
- Meat Science and Muscle Biology Laboratory, Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Lei Wang
- College of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Guofang Wu
- College of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Chuanying Pan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| |
Collapse
|
14
|
English A, Dolzhenko E, Jam HZ, Mckenzie S, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Benchmarking of small and large variants across tandem repeats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.29.564632. [PMID: 37961319 PMCID: PMC10634962 DOI: 10.1101/2023.10.29.564632] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Collapse
|
15
|
Bhati M, Mapel XM, Lloret-Villas A, Pausch H. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 2023; 225:iyad161. [PMID: 37655920 PMCID: PMC10627265 DOI: 10.1093/genetics/iyad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are significant sources of genetic variation. However, the impacts of these variants on gene regulation have not been investigated in cattle. Here, we genotyped and characterized 19,408 SVs and 374,821 STRs in 183 bovine genomes and investigated their impact on molecular phenotypes derived from testis transcriptomes. We found that 71% STRs were multiallelic. The vast majority (95%) of STRs and SVs were in intergenic and intronic regions. Only 37% SVs and 40% STRs were in high linkage disequilibrium (LD) (R2 > 0.8) with surrounding SNPs/insertions and deletions (Indels), indicating that SNP-based association testing and genomic prediction are blind to a nonnegligible portion of genetic variation. We showed that both SVs and STRs were more than 2-fold enriched among expression and splicing QTL (e/sQTL) relative to SNPs/Indels and were often associated with differential expression and splicing of multiple genes. Deletions and duplications had larger impacts on splicing and expression than any other type of SV. Exonic duplications predominantly increased gene expression either through alternative splicing or other mechanisms, whereas expression- and splicing-associated STRs primarily resided in intronic regions and exhibited bimodal effects on the molecular phenotypes investigated. Most e/sQTL resided within 100 kb of the affected genes or splicing junctions. We pinpoint candidate causal STRs and SVs associated with the expression of SLC13A4 and TTC7B and alternative splicing of a lncRNA and CAPP1. We provide a catalog of STRs and SVs for taurine cattle and show that these variants contribute substantially to gene expression and splicing variation.
Collapse
Affiliation(s)
- Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | | | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| |
Collapse
|
16
|
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun 2023; 14:6711. [PMID: 37872149 PMCID: PMC10593948 DOI: 10.1038/s41467-023-42278-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/05/2023] [Indexed: 10/25/2023] Open
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, CA, 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
17
|
Link V, Zavaleta YJA, Reyes RJ, Ding L, Wang J, Rohlfs RV, Edge MD. Microsatellites used in forensics are in regions enriched for trait-associated variants. iScience 2023; 26:107992. [PMID: 37841589 PMCID: PMC10570123 DOI: 10.1016/j.isci.2023.107992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 08/10/2023] [Accepted: 09/18/2023] [Indexed: 10/17/2023] Open
Abstract
The 20 short tandem repeat (STR) loci of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS loci are thought to contain little information about ancestry or traits. However, in the past 20 years, a growing field has identified hundreds of thousands of genotype-trait associations. Here, we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. Although this study cannot establish or quantify associations between CODIS genotypes and phenotypes, we find that the regions around the CODIS loci are enriched for both known pathogenic variants (> 90th percentile) and for trait-associated SNPs identified in genome-wide association studies (GWAS) (≥ 95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Rochelle-Jan Reyes
- Department of Biology, San Francisco State University, San Francisco, CA, USA
| | - Linda Ding
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Judy Wang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Rori V Rohlfs
- Department of Biology, San Francisco State University, San Francisco, CA, USA
- Department of Data Science and Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
18
|
Lundström OS, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, Anisimova M, Gymrek M. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. J Mol Biol 2023; 435:168260. [PMID: 37678708 DOI: 10.1016/j.jmb.2023.168260] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/29/2023] [Accepted: 08/29/2023] [Indexed: 09/09/2023]
Abstract
Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.
Collapse
Affiliation(s)
- Oxana Sachenkova Lundström
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Vildly AB, Kalmar, Sweden; Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland. https://twitter.com/merenlin
| | - Max Adriaan Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Feifei Xia
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. https://twitter.com/Feifeix97
| | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Inti Zlobec
- Institute of Tissue Medicine and Pathology, University of Bern, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA; Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
19
|
Reinar WB, Tørresen OK, Nederbragt AJ, Matschiner M, Jentoft S, Jakobsen KS. Teleost genomic repeat landscapes in light of diversification rates and ecology. Mob DNA 2023; 14:14. [PMID: 37789366 PMCID: PMC10546739 DOI: 10.1186/s13100-023-00302-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we demonstrated that TE proportion correlates with genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish had large differences in STR content, with the most extreme propagation detected in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
Collapse
Affiliation(s)
| | - Ole K Tørresen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander J Nederbragt
- Department of Biosciences, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Michael Matschiner
- Department of Biosciences, University of Oslo, Oslo, Norway
- University of Oslo, Natural History Museum, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, University of Oslo, Oslo, Norway
| | | |
Collapse
|
20
|
De Wolf J, Robin E, Vallee A, Cohen J, Hamid A, Roux A, Leguen M, Beaurepere R, Bieche I, Masliah-Planchon J, Glorion M, Allory Y, Sage E. Donor/recipient origin of lung cancer after lung transplantation by DNA short tandem repeat analysis. Front Oncol 2023; 13:1225538. [PMID: 37841427 PMCID: PMC10568626 DOI: 10.3389/fonc.2023.1225538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Lung cancer is more common in posttransplant recipients than in the general population. The objective of this study was to examine the chimerism donor/recipient cell origin of graft cancer in recipients of lung transplant. Methods A retrospective chart review was conducted at Foch Hospital for all lung transplantations from 1989 to 2020. Short tandem repeat PCR (STR-PCR) analysis, the gold standard technique for chimerism quantification, was used to determine the donor/recipient cell origin of lung cancers in transplant patients. Results Fourteen (1.4%) of the 1,026 patients were found to have graft lung cancer after lung transplantation, and one developed two different lung tumors in the same lobe. Among the 15 lung tumors, 10 (67%) presented with adenocarcinoma, four (27%) with squamous cell carcinoma and one with small cell lung cancer. STR analysis showed that the origin of the cancer was the donor in 10 patients (71%), the recipient in three patients (21%), and was undetermined in one patient. Median time to diagnosis was 62 months. Conclusion The prevalence of lung cancer in lung transplant recipients is very low. However, the results of our study showed heterogeneity of genetic alterations, with 21% being of recipient origin. Our results highlight the importance of donor selection and medical supervision after lung transplantation.
Collapse
Affiliation(s)
- Julien De Wolf
- Department of Thoracic Surgery and Lung Transplantation, Foch Hospital, Suresnes, France
| | - Edouard Robin
- Department of Thoracic Surgery and Lung Transplantation, Foch Hospital, Suresnes, France
| | - Alexandre Vallee
- Department of Clinical Research and Innovation Foch Hospital, Suresnes, France
| | - Justine Cohen
- Department of Anatomopathology, Foch Hospital, Suresnes, France
| | - Abdul Hamid
- Department of Pneumology, Foch Hospital, Suresnes, France
| | - Antoine Roux
- Department of Pneumology, Foch Hospital, Suresnes, France
| | - Morgan Leguen
- Department of Anesthesiology, Foch Hospital, Suresnes, France
- Université Paris-Saclay, INRAE, UVSQ, VIM, Jouy-en-Josas, France
| | | | - Ivan Bieche
- Genetics Department, Curie Institut, Paris, France
| | | | - Matthieu Glorion
- Department of Thoracic Surgery and Lung Transplantation, Foch Hospital, Suresnes, France
| | - Yves Allory
- Department of Anatomopathology, Foch Hospital, Suresnes, France
- Department of Anatomopathology, Curie Institut, Paris, France
| | - Edouard Sage
- Department of Thoracic Surgery and Lung Transplantation, Foch Hospital, Suresnes, France
- Université Paris-Saclay, INRAE, UVSQ, VIM, Jouy-en-Josas, France
| | | |
Collapse
|
21
|
Lutz MW, Chiba-Falek O. Bioinformatics pipeline to guide post-GWAS studies in Alzheimer's: A new catalogue of disease candidate short structural variants. Alzheimers Dement 2023; 19:4094-4109. [PMID: 37253165 PMCID: PMC10524333 DOI: 10.1002/alz.13168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 04/27/2023] [Accepted: 05/08/2023] [Indexed: 06/01/2023]
Abstract
BACKGROUND Short structural variants (SSVs), including insertions/deletions (indels), are common in the human genome and impact disease risk. The role of SSVs in late-onset Alzheimer's disease (LOAD) has been understudied. In this study, we developed a bioinformatics pipeline of SSVs within LOAD-genome-wide association study (GWAS) regions to prioritize regulatory SSVs based on the strength of their predicted effect on transcription factor (TF) binding sites. METHODS The pipeline utilized publicly available functional genomics data sources including candidate cis-regulatory elements (cCREs) from ENCODE and single-nucleus (sn)RNA-seq data from LOAD patient samples. RESULTS We catalogued 1581 SSVs in candidate cCREs in LOAD GWAS regions that disrupted 737 TF sites. That included SSVs that disrupted the binding of RUNX3, SPI1, and SMAD3, within the APOE-TOMM40, SPI1, and MS4A6A LOAD regions. CONCLUSIONS The pipeline developed here prioritized non-coding SSVs in cCREs and characterized their putative effects on TF binding. The approach integrates multiomics datasets for validation experiments using disease models.
Collapse
Affiliation(s)
- Michael W. Lutz
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
| | - Ornit Chiba-Falek
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC 27710, USA
| |
Collapse
|
22
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539448. [PMID: 37205567 PMCID: PMC10187267 DOI: 10.1101/2023.05.04.539448] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Christine R. Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032 USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
23
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
24
|
Mbewe RB, Keven JB, Mangani C, Wilson ML, Mzilahowa T, Mathanga DP, Valim C, Laufer MK, Walker ED, Cohee LM. Genotyping of Anopheles mosquito blood meals reveals nonrandom human host selection: implications for human-to-mosquito Plasmodium falciparum transmission. Malar J 2023; 22:115. [PMID: 37029433 PMCID: PMC10080529 DOI: 10.1186/s12936-023-04541-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 03/22/2023] [Indexed: 04/09/2023] Open
Abstract
BACKGROUND Control of malaria parasite transmission can be enhanced by understanding which human demographic groups serve as the infectious reservoirs. Because vector biting can be heterogeneous, some infected individuals may contribute more to human-to-mosquito transmission than others. Infection prevalence peaks in school-age children, but it is not known how often they are fed upon. Genotypic profiling of human blood permits identification of individual humans who were bitten. The present investigation used this method to estimate which human demographic groups were most responsible for transmitting malaria parasites to Anopheles mosquitoes. It was hypothesized that school-age children contribute more than other demographic groups to human-to-mosquito malaria transmission. METHODS In a region of moderate-to-high malaria incidence in southeastern Malawi, randomly selected households were surveyed to collect human demographic information and blood samples. Blood-fed, female Anopheles mosquitoes were sampled indoors from the same houses. Genomic DNA from human blood samples and mosquito blood meals of human origin was genotyped using 24 microsatellite loci. The resultant genotypes were matched to identify which individual humans were sources of blood meals. In addition, Plasmodium falciparum DNA in mosquito abdomens was detected with polymerase chain reaction. The combined results were used to identify which humans were most frequently bitten, and the P. falciparum infection prevalence in mosquitoes that resulted from these blood meals. RESULTS Anopheles females selected human hosts non-randomly and fed on more than one human in 9% of the blood meals. Few humans contributed most of the blood meals to the Anopheles vector population. Children ≤ 5 years old were under-represented in mosquito blood meals while older males (31-75 years old) were over-represented. However, the largest number of malaria-infected blood meals was from school age children (6-15 years old). CONCLUSIONS The results support the hypothesis that humans aged 6-15 years are the most important demographic group contributing to the transmission of P. falciparum to the Anopheles mosquito vectors. This conclusion suggests that malaria control and prevention programmes should enhance efforts targeting school-age children and males.
Collapse
Affiliation(s)
- Rex B Mbewe
- Department of Entomology, Michigan State University, East Lansing, MI, USA.
- Department of Physics and Biochemical Sciences, Malawi University of Business and Applied Sciences, Blantyre, Malawi.
| | - John B Keven
- Department of Entomology, Michigan State University, East Lansing, MI, USA
- Department of Public Health, College of Health Sciences, University of California-Irvine, Irvine, CA, USA
| | - Charles Mangani
- Malaria Alert Center, Kamuzu University of Health Sciences, Blantyre, Malawi
| | - Mark L Wilson
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Themba Mzilahowa
- Malaria Alert Center, Kamuzu University of Health Sciences, Blantyre, Malawi
| | - Don P Mathanga
- Malaria Alert Center, Kamuzu University of Health Sciences, Blantyre, Malawi
| | - Clarissa Valim
- Department of Global Health, Boston University School of Public Health, Boston, MA, USA
| | - Miriam K Laufer
- Malaria Research Program, Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Edward D Walker
- Department of Entomology, Michigan State University, East Lansing, MI, USA
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA
| | - Lauren M Cohee
- Malaria Research Program, Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
25
|
Wright SE, Todd PK. Native functions of short tandem repeats. eLife 2023; 12:e84043. [PMID: 36940239 PMCID: PMC10027321 DOI: 10.7554/elife.84043] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 03/08/2023] [Indexed: 03/21/2023] Open
Abstract
Over a third of the human genome is comprised of repetitive sequences, including more than a million short tandem repeats (STRs). While studies of the pathologic consequences of repeat expansions that cause syndromic human diseases are extensive, the potential native functions of STRs are often ignored. Here, we summarize a growing body of research into the normal biological functions for repetitive elements across the genome, with a particular focus on the roles of STRs in regulating gene expression. We propose reconceptualizing the pathogenic consequences of repeat expansions as aberrancies in normal gene regulation. From this altered viewpoint, we predict that future work will reveal broader roles for STRs in neuronal function and as risk alleles for more common human neurological diseases.
Collapse
Affiliation(s)
- Shannon E Wright
- Department of Neurology, University of Michigan–Ann ArborAnn ArborUnited States
- Neuroscience Graduate Program, University of Michigan–Ann ArborAnn ArborUnited States
- Department of Neuroscience, Picower InstituteCambridgeUnited States
| | - Peter K Todd
- Department of Neurology, University of Michigan–Ann ArborAnn ArborUnited States
- VA Ann Arbor Healthcare SystemAnn ArborUnited States
| |
Collapse
|
26
|
Jam HZ, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531600. [PMID: 36945429 PMCID: PMC10028971 DOI: 10.1101/2023.03.09.531600] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3,550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, California 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
- Department of Medicine, University of California San Diego, La Jolla, CA
| |
Collapse
|
27
|
Link V, Zavaleta YJA, Reyes RJ, Ding L, Wang J, Rohlfs RV, Edge MD. Microsatellites used in forensics are located in regions unusually rich in trait-associated variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531629. [PMID: 36945578 PMCID: PMC10028909 DOI: 10.1101/2023.03.07.531629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
The 20 short tandem repeat (STR) markers of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS markers are thought to contain information relevant to identification only (such as a human fingerprint would), with little information about ancestry or traits. However, in the past 20 years, a quickly growing field has identified hundreds of thousands of genotype-trait associations. Here we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. We find that the regions around the CODIS markers are enriched for both known pathogenic variants (>90th percentile) and for SNPs identified as trait-associated in genome-wide association studies (GWAS) (≥95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs. Although it is not obvious how much phenotypic information CODIS would need to convey to strain the "DNA fingerprint" analogy, the CODIS markers, considered as a set, are in regions unusually dense with variants with known phenotypic associations.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | | | | | - Linda Ding
- Department of Quantitative and Computational Biology, University of Southern California
| | - Judy Wang
- Department of Quantitative and Computational Biology, University of Southern California
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University
- Department of Computer Science and Institute of Ecology and Evolution, University of Oregon
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
28
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
29
|
Martin-Trujillo A, Garg P, Patel N, Jadhav B, Sharp AJ. Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation. Genome Res 2023; 33:184-196. [PMID: 36577521 PMCID: PMC10069470 DOI: 10.1101/gr.277057.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 12/19/2022] [Indexed: 12/30/2022]
Abstract
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
Collapse
Affiliation(s)
- Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Nihir Patel
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| |
Collapse
|
30
|
Spruijtenburg B, van Haren MHI, Chowdhary A, Meis JF, de Groot T. Development and Application of a Short Tandem Repeat Multiplex Typing Assay for Candida tropicalis. Microbiol Spectr 2023; 11:e0461822. [PMID: 36715547 PMCID: PMC10100945 DOI: 10.1128/spectrum.04618-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/11/2023] [Indexed: 01/31/2023] Open
Abstract
Candida tropicalis is a clinically important yeast that causes candidemia in humans with a high mortality rate. The yeast primarily infects immunocompromised patients, and causes outbreaks in health care facilities. Antifungal resistant isolates have been reported. We developed a short tandem repeat (STR) typing scheme for C. tropicalis to enable fast, cost-effective, and high-resolution genotyping. For the development of the typing scheme, 6 novel STR markers were selected, combined into 2 multiplex PCRs. In total, 117 C. tropicalis isolates were typed, resulting in the identification of 104 different genotypes. Subsequently, the outcome of STR typing of 10 isolates was compared to single nucleotide polymorphism (SNP) calling from whole-genome sequencing (WGS). Isolates with more than 111 SNPs were differentiated by the typing assay. Two isolates, which were identical according to SNP analysis, were separated by STR typing in 1 marker. To test specificity, the STR typing was applied to 15 related yeast species, and we found no amplification of these targets. For reproducibility testing, 2 isolates were independently typed five times, which showed identical results in each experiment. In summary, we developed a reliable and multiplex STR genotyping for C. tropicalis, which was found to correlate well to SNP calling by WGS. WGS analysis from and extensive collection of isolates is required to establish the precise resolution of this STR assay. IMPORTANCE Candida tropicalis frequently causes candidemia in immunocompromised patients. C. tropicalis infections have a high mortality rate, and the yeast is able to cause outbreaks in health care facilities. Further, antifungal resistant isolates are on the rise. Genotyping is necessary to investigate potential outbreaks. Here, we developed and applied a STR genotyping scheme in order to rapidly genotype isolates with a high-resolution. WGS SNP outcomes were highly comparable with STR typing results. Altogether, we developed a rapid, high-resolution, and specific STR genotyping scheme for C. tropicalis.
Collapse
Affiliation(s)
- Bram Spruijtenburg
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Centre of Expertise in Mycology, Radboud University Medical Center/Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
| | - Merlijn H. I. van Haren
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
| | - Anuradha Chowdhary
- Medical Mycology Unit, Department of Medical Microbiology, Vallabhbhai Patel Chest Institute, University of Delhi, Delhi, India
| | - Jacques F. Meis
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Centre of Expertise in Mycology, Radboud University Medical Center/Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Department I of Internal Medicine, University of Cologne, Excellence Center for Medical Mycology, Cologne, Germany
| | - Theun de Groot
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Centre of Expertise in Mycology, Radboud University Medical Center/Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
| |
Collapse
|
31
|
Gochi L, Kawai Y, Fujimoto A. Comprehensive analysis of microsatellite polymorphisms in human populations. Hum Genet 2023; 142:45-57. [PMID: 36048238 DOI: 10.1007/s00439-022-02484-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/24/2022] [Indexed: 01/18/2023]
Abstract
Microsatellites (MS) are tandem repeats of short units, and have been used for population genetics, individual identification, and medical genetics. However, studies of MS on a whole-genome level are limited, and genotyping methods for MS have yet to be established. Here, we analyzed approximately 8.5 million MS regions using a previously developed MS caller for short reads (MIVcall method) for three large publicly available human genome sequencing data sets: the Korean Personal Genome Project, Simons Genome Diversity Project, and Human Genome Diversity Project. Our analysis identified 253,114 polymorphic MS. A comparison among different populations suggests that MS in the coding region evolved by random genetic drift and natural selection. In an analysis of genetic structures, MS clearly revealed population structures as SNPs and detected clusters that were not found by SNPs in African and Oceanian populations. Based on the MS polymorphisms, we selected MS marker candidates for individual identification. Finally, we applied our method to a deep sequenced ancient DNA sample. This study provides a comprehensive picture of MS polymorphisms and application to human population studies.
Collapse
Affiliation(s)
- Leo Gochi
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0003, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0003, Japan.
| |
Collapse
|
32
|
Rare tandem repeat expansions associate with genes involved in synaptic and neuronal signaling functions in schizophrenia. Mol Psychiatry 2023; 28:475-482. [PMID: 36380236 PMCID: PMC9812781 DOI: 10.1038/s41380-022-01857-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 10/14/2022] [Accepted: 10/24/2022] [Indexed: 11/17/2022]
Abstract
Tandem repeat expansions (TREs) are associated with over 60 monogenic disorders and have recently been implicated in complex disorders such as cancer and autism spectrum disorder. The role of TREs in schizophrenia is now emerging. In this study, we have performed a genome-wide investigation of TREs in schizophrenia. Using genome sequence data from 1154 Swedish schizophrenia cases and 934 ancestry-matched population controls, we have detected genome-wide rare (<0.1% population frequency) TREs that have motifs with a length of 2-20 base pairs. We find that the proportion of individuals carrying rare TREs is significantly higher in the schizophrenia group. There is a significantly higher burden of rare TREs in schizophrenia cases than in controls in genic regions, particularly in postsynaptic genes, in genes overlapping brain expression quantitative trait loci, and in brain-expressed genes that are differentially expressed between schizophrenia cases and controls. We demonstrate that TRE-associated genes are more constrained and primarily impact synaptic and neuronal signaling functions. These results have been replicated in an independent Canadian sample that consisted of 252 schizophrenia cases of European ancestry and 222 ancestry-matched controls. Our results support the involvement of rare TREs in schizophrenia etiology.
Collapse
|
33
|
Kaplun L, Krautz-Peterson G, Neerman N, Stanley C, Hussey S, Folwick M, McGarry A, Weiss S, Kaplun A. ONT long-read WGS for variant discovery and orthogonal confirmation of short read WGS derived genetic variants in clinical genetic testing. Front Genet 2023; 14:1145285. [PMID: 37152986 PMCID: PMC10160624 DOI: 10.3389/fgene.2023.1145285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Technological advances in Next-Generation Sequencing dramatically increased clinical efficiency of genetic testing, allowing detection of a wide variety of variants, from single nucleotide events to large structural aberrations. Whole Genome Sequencing (WGS) has allowed exploration of areas of the genome that might not have been targeted by other approaches, such as intergenic regions. A single technique detecting all genetic variants at once is intended to expedite the diagnostic process while making it more comprehensive and efficient. Nevertheless, there are still several shortcomings that cannot be effectively addressed by short read sequencing, such as determination of the precise size of short tandem repeat (STR) expansions, phasing of potentially compound recessive variants, resolution of some structural variants and exact determination of their boundaries, etc. Therefore, in some cases variants can only be tentatively detected by short reads sequencing and require orthogonal confirmation, particularly for clinical reporting purposes. Moreover, certain regulatory authorities, for example, New York state CLIA, require orthogonal confirmation of every reportable variant. Such orthogonal confirmations often involve numerous different techniques, not necessarily available in the same laboratory and not always performed in an expedited manner, thus negating the advantages of "one-technique-for-all" approach, and making the process lengthy, prone to logistical and analytical faults, and financially inefficient. Fortunately, those weak spots of short read sequencing can be compensated by long read technology that have comparable or better detection of some types of variants while lacking the mentioned above limitations of short read sequencing. At Variantyx we have developed an integrated clinical genetic testing approach, augmenting short read WGS-based variant detection with Oxford Nanopore Technologies (ONT) long read sequencing, providing simultaneous orthogonal confirmation of all types of variants with the additional benefit of improved identification of exact size and position of the detected aberrations. The validation study of this augmented test has demonstrated that Oxford Nanopore Technologies sequencing can efficiently verify multiple types of reportable variants, thus ensuring highly reliable detection and a quick turnaround time for WGS-based clinical genetic testing.
Collapse
|
34
|
Frontanilla TS, Valle-Silva G, Ayala J, Mendes-Junior CT. Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project. Genes (Basel) 2022; 13:genes13122205. [PMID: 36553472 PMCID: PMC9778533 DOI: 10.3390/genes13122205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/13/2022] [Accepted: 11/21/2022] [Indexed: 11/27/2022] Open
Abstract
Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.
Collapse
Affiliation(s)
- Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14049-900, SP, Brazil
| | - Guilherme Valle-Silva
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, Brazil
| | - Jesus Ayala
- Facultad de Ingeniería Informática, Universidad de la Integración de las Americas, Asunción 00120-6, Paraguay
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, Brazil
- Correspondence:
| |
Collapse
|
35
|
Assessing Sequence Variation and Genetic Diversity of Currently Untapped Y-STR Loci. FORENSIC SCIENCE INTERNATIONAL: REPORTS 2022. [DOI: 10.1016/j.fsir.2022.100298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
36
|
de Groot T, Spruijtenburg B, Parnell LA, Chow NA, Meis JF. Optimization and Validation of Candida auris Short Tandem Repeat Analysis. Microbiol Spectr 2022; 10:e0264522. [PMID: 36190407 PMCID: PMC9603409 DOI: 10.1128/spectrum.02645-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/06/2022] [Indexed: 01/04/2023] Open
Abstract
Candida auris is an easily transmissible yeast with resistance to different antifungal compounds. Outbreaks of C. auris are mostly observed in intensive care units. To take adequate measures during an outbreak, it is essential to understand the transmission route, which requires isolate genotyping. In 2019, a short tandem repeat (STR) genotyping analysis was developed for C. auris. To determine the discriminatory power of this method, we performed STR analysis of 171 isolates with known whole-genome sequencing (WGS) data using Illumina reads, and we compared their resolutions. We found that STR analysis separated the 171 isolates into four clades (clades I to IV), as was also seen with WGS analysis. Then, to improve the separation of isolates in clade IV, the STR assay was optimized by the addition of 2 STR markers. With this improved STR assay, a total of 32 different genotypes were identified, while all isolates with differences of >50 single-nucleotide polymorphisms (SNPs) were separated by at least 1 STR marker. Altogether, we optimized and validated the C. auris STR panel for clades I to IV and established its discriminatory power, compared to WGS SNP analysis using Illumina reads. IMPORTANCE The emerging fungal pathogen Candida auris poses a threat to public health, mainly causing outbreaks in intensive care units. Genotyping is essential for investigating potential outbreaks and preventing further spread. Previously, we developed a STR genotyping scheme for rapid and high-resolution genotyping, and WGS SNP outcomes for some isolates were compared to STR data. Here, we compared WGS SNP and STR outcomes for a larger sample cohort. Also, we optimized the resolution of this typing scheme with the addition of 2 STR markers. Altogether, we validated and optimized this rapid, reliable, and high-resolution typing scheme for C. auris.
Collapse
Affiliation(s)
- Theun de Groot
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Centre of Expertise in Mycology, Radboud University Medical Center/Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
| | - Bram Spruijtenburg
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Centre of Expertise in Mycology, Radboud University Medical Center/Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
| | - Lindsay A. Parnell
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Nancy A. Chow
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Jacques F. Meis
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
- Centre of Expertise in Mycology, Radboud University Medical Center/Canisius Wilhelmina Hospital, Nijmegen, The Netherlands
| |
Collapse
|
37
|
Bañuelos MM, Zavaleta YJA, Roldan A, Reyes RJ, Guardado M, Chavez Rojas B, Nyein T, Rodriguez Vega A, Santos M, Huerta-Sanchez E, Rohlfs RV. Associations between forensic loci and expression levels of neighboring genes may compromise medical privacy. Proc Natl Acad Sci U S A 2022; 119:e2121024119. [PMID: 36166477 PMCID: PMC9546536 DOI: 10.1073/pnas.2121024119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 08/29/2022] [Indexed: 11/18/2022] Open
Abstract
A set of 20 short tandem repeats (STRs) is used by the US criminal justice system to identify suspects and to maintain a database of genetic profiles for individuals who have been previously convicted or arrested. Some of these STRs were identified in the 1990s, with a preference for markers in putative gene deserts to avoid forensic profiles revealing protected medical information. We revisit that assumption, investigating whether forensic genetic profiles reveal information about gene-expression variation or potential medical information. We find six significant correlations (false discovery rate = 0.23) between the forensic STRs and the expression levels of neighboring genes in lymphoblastoid cell lines. We explore possible mechanisms for these associations, showing evidence compatible with forensic STRs causing expression variation or being in linkage disequilibrium with a causal locus in three cases and weaker or potentially spurious associations in the other three cases. Together, these results suggest that forensic genetic loci may reveal expression levels and, perhaps, medical information.
Collapse
Affiliation(s)
- Mayra M. Bañuelos
- Department of Mathematics, San Francisco State University, San Francisco, CA 94132
- Ecology, Evolution and Organismal Biology, Brown University, Providence, RI 02912
- Center for Computational and Molecular Biology, Brown University, Providence, RI 02912
| | | | - Alennie Roldan
- Department of Biology, San Francisco State University, San Francisco, CA 94132
| | - Rochelle-Jan Reyes
- Department of Biology, San Francisco State University, San Francisco, CA 94132
| | - Miguel Guardado
- Department of Mathematics, San Francisco State University, San Francisco, CA 94132
| | | | - Thet Nyein
- Department of Mathematics, San Francisco State University, San Francisco, CA 94132
| | - Ana Rodriguez Vega
- Department of Biology, San Francisco State University, San Francisco, CA 94132
| | - Maribel Santos
- Department of Biology, San Francisco State University, San Francisco, CA 94132
| | - Emilia Huerta-Sanchez
- Ecology, Evolution and Organismal Biology, Brown University, Providence, RI 02912
- Center for Computational and Molecular Biology, Brown University, Providence, RI 02912
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University, San Francisco, CA 94132
| |
Collapse
|
38
|
Crysup B, Woerner AE. A genotype likelihood function for DNA mixtures. Forensic Sci Int Genet 2022; 61:102776. [PMID: 36152508 DOI: 10.1016/j.fsigen.2022.102776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 08/10/2022] [Accepted: 09/14/2022] [Indexed: 11/04/2022]
Abstract
The recent advent of genetic genealogy has brought about a renewed interest in genome-scale forensic analyses, of which kinship estimation is a critical component. Most genomic kinship estimators consider SNPs (single nucleotide polymorphisms), often leveraging the co-inheritance of shared alleles to inform their analyses. While current estimators cannot directly evaluate mixed samples, there exist well-established SNP-based kinship estimators tailored to considering challenged samples, including low-pass whole genome sequencing. As an example, several studies have shown remarkable success in imputing genotype posterior probabilities in low template samples when linked sites are considered. Critical to these approaches is the ability to account for genotype uncertainty; the lack of an expression for a genotype likelihood in imbalanced mixtures has prevented direct application. This work develops such an expression. The formulation is fully compatible with genotype imputation software, suggesting a genomic pipeline that estimates genotype likelihoods, performs imputation, and then estimates kinship when the sample is a mixture. Further, when framed as an imbalanced mixture, the problem of mixture deconvolution is reducible to the problem of genotyping mixed samples. Herein, the ability to genotype two-person mixtures is assessed through example and in silico settings. While certain mixture scenarios and classes of sites are inherently inseparable, simulations of read depths between 60 and 190 appear to produce likelihoods of sufficient magnitude to deconvolve two-person mixtures whenever the mixture fraction is moderately imbalanced. The described approach and results suggest a path forward for estimating the kinship coefficient (and similar inferences on relatedness) when the sample is a mixture.
Collapse
Affiliation(s)
- Benjamin Crysup
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - August E Woerner
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA; Department of Microbiology, Immunology and Genetics, University of North Texas Health Science, USA.
| |
Collapse
|
39
|
Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep 2022; 12:13124. [PMID: 35907931 PMCID: PMC9338934 DOI: 10.1038/s41598-022-17267-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.
Collapse
|
40
|
Zupanič Pajnič I, Previderè C, Zupanc T, Zanon M, Fattorini P. Isometric artifacts from polymerase chain reaction‐massively parallel sequencing analysis of short tandem repeat loci: An emerging issue from a new technology? Electrophoresis 2022; 43:1521-1530. [PMID: 35358339 PMCID: PMC9543752 DOI: 10.1002/elps.202100143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 01/28/2022] [Accepted: 03/26/2022] [Indexed: 11/19/2022]
Abstract
The recent introduction of polymerase chain reaction (PCR)‐massively parallel sequencing (MPS) technologies in forensics has changed the approach to allelic short tandem repeat (STR) typing because sequencing cloned PCR fragments enables alleles with identical molecular weights to be distinguished based on their nucleotide sequences. Therefore, because PCR fidelity mainly depends on template integrity, new technical issues could arise in the interpretation of the results obtained from the degraded samples. In this work, a set of DNA samples degraded in vitro was used to investigate whether PCR‐MPS could generate “isometric drop‐ins” (IDIs; i.e., molecular products having the same length as the original allele but with a different nucleotide sequence within the repeated units). The Precision ID GlobalFiler NGS STR panel kit was used to analyze 0.5 and 1 ng of mock samples in duplicate tests (for a total of 16 PCR‐MPS analyses). As expected, several well‐known PCR artifacts (such as allelic dropout, stutters above the threshold) were scored; 95 IDIs with an average occurrence of 5.9 IDIs per test (min: 1, max: 11) were scored as well. In total, IDIs represented one of the most frequent artifacts. The coverage of these IDIs reached up to 981 reads (median: 239 reads), and the ratios with the coverage of the original allele ranged from 0.069 to 7.285 (median: 0.221). In addition, approximately 5.2% of the IDIs showed coverage higher than that of the original allele. Molecular analysis of these artifacts showed that they were generated in 96.8% of cases through a single nucleotide change event, with the C > T transition being the most frequent (85.7%). Thus, in a forensic evaluation of evidence, IDIs may represent an actual issue, particularly when DNA mixtures need to be interpreted because they could mislead the operator regarding the number of contributors. Overall, the molecular features of the IDIs described in this work, as well as the performance of duplicate tests, may be useful tools for managing this new class of artifacts otherwise not detected by capillary electrophoresis technology.
Collapse
Affiliation(s)
- Irena Zupanič Pajnič
- Institute of Forensic Medicine Faculty of Medicine University of Ljubljana Ljubljana Slovenia
| | - Carlo Previderè
- Department of Public Health Experimental and Forensic Medicine Section of Legal Medicine and Forensic Sciences University of Pavia Pavia Italy
| | - Tomaž Zupanc
- Institute of Forensic Medicine Faculty of Medicine University of Ljubljana Ljubljana Slovenia
| | - Martina Zanon
- Department of Medicine, Surgery and Health University of Trieste Trieste Italy
| | - Paolo Fattorini
- Department of Medicine, Surgery and Health University of Trieste Trieste Italy
| |
Collapse
|
41
|
Boldyreva LV, Andreyeva EN, Pindyurin AV. Position Effect Variegation: Role of the Local Chromatin Context in Gene Expression Regulation. Mol Biol 2022. [DOI: 10.1134/s0026893322030049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
42
|
Liu Z, Zhao G, Xiao Y, Zeng S, Yuan Y, Zhou X, Fang Z, He R, Li B, Zhao Y, Pan H, Wang Y, Yu G, Peng IF, Wang D, Meng Q, Xu Q, Sun Q, Yan X, Shen L, Jiang H, Xia K, Wang J, Guo J, Liang F, Li J, Tang B. Profiling the Genome-Wide Landscape of Short Tandem Repeats by Long-Read Sequencing. Front Genet 2022; 13:810595. [PMID: 35601492 PMCID: PMC9117641 DOI: 10.3389/fgene.2022.810595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 03/30/2022] [Indexed: 11/23/2022] Open
Abstract
Background: Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases and the regulation of gene expression. Long-read sequencing (LRS) offers a potential solution to genome-wide STR analysis. However, characterizing STRs in human genomes using LRS on a large population scale has not been reported. Methods: We conducted the large LRS-based STR analysis in 193 unrelated samples of the Chinese population and performed genome-wide profiling of STR variation in the human genome. The repeat dynamic index (RDI) was introduced to evaluate the variability of STR. We sourced the expression data from the Genotype-Tissue Expression to explore the tissue specificity of highly variable STRs related genes across tissues. Enrichment analyses were also conducted to identify potential functional roles of the high variable STRs. Results: This study reports the large-scale analysis of human STR variation by LRS and offers a reference STR database based on the LRS dataset. We found that the disease-associated STRs (dSTRs) and STRs associated with the expression of nearby genes (eSTRs) were highly variable in the general population. Moreover, tissue-specific expression analysis showed that those highly variable STRs related genes presented the highest expression level in brain tissues, and enrichment pathways analysis found those STRs are involved in synaptic function-related pathways. Conclusion: Our study profiled the genome-wide landscape of STR using LRS and highlighted the highly variable STRs in the human genome, which provide a valuable resource for studying the role of STRs in human disease and complex traits.
Collapse
Affiliation(s)
- Zhenhua Liu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | | | - Sheng Zeng
- Department of Geriatrics, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Yanchun Yuan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Xun Zhou
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Zhenghuan Fang
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Runcheng He
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | | | | | | | - Qingtuan Meng
- Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital of University of South China, Hengyang, China
| | - Qian Xu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Qiying Sun
- Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
| | - Xinxiang Yan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Lu Shen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Hong Jiang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Kun Xia
- Centre for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Junling Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Jifeng Guo
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, China
- *Correspondence: Beisha Tang, ; Jinchen Li, ; Fan Liang,
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
- Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
- Centre for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- *Correspondence: Beisha Tang, ; Jinchen Li, ; Fan Liang,
| | - Beisha Tang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
- Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital of University of South China, Hengyang, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
- *Correspondence: Beisha Tang, ; Jinchen Li, ; Fan Liang,
| |
Collapse
|
43
|
Kasai S, Nishizawa D, Hasegawa J, Fukuda KI, Ichinohe T, Nagashima M, Hayashida M, Ikeda K. Short Tandem Repeat Variation in the CNR1 Gene Associated With Analgesic Requirements of Opioids in Postoperative Pain Management. Front Genet 2022; 13:815089. [PMID: 35360861 PMCID: PMC8963810 DOI: 10.3389/fgene.2022.815089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 02/02/2022] [Indexed: 11/25/2022] Open
Abstract
Short tandem repeats (STRs) and variable number of tandem repeats (VNTRs) that have been identified at approximately 0.7 and 0.5 million loci in the human genome, respectively, are highly multi-allelic variations rather than single-nucleotide polymorphisms. The number of repeats of more than a few thousand STRs was associated with the expression of nearby genes, indicating that STRs are influential genetic variations in human traits. Analgesics act on the central nervous system via their intrinsic receptors to produce analgesic effects. In the present study, we focused on STRs and VNTRs in the CNR1, GRIN2A, PENK, and PDYN genes and analyzed two peripheral pain sensation-related traits and seven analgesia-related traits in postoperative pain management. A total of 192 volunteers who underwent the peripheral pain sensation tests and 139 and 252 patients who underwent open abdominal and orthognathic cosmetic surgeries, respectively, were included in the study. None of the four STRs or VNTRs were associated with peripheral pain sensation. Short tandem repeats in the CNR1, GRIN2A, and PENK genes were associated with the frequency of fentanyl use, fentanyl dose, and visual analog scale pain scores 3 h after orthognathic cosmetic surgery (Spearman’s rank correlation coefficient ρ = 0.199, p = 0.002, ρ = 0.174, p = 0.006, and ρ = 0.135, p = 0.033, respectively), analgesic dose, including epidural analgesics after open abdominal surgery (ρ = −0.200, p = 0.018), and visual analog scale pain scores 24 h after orthognathic cosmetic surgery (ρ = 0.143, p = 0.023), respectively. The associations between STRs in the CNR1 gene and the frequency of fentanyl use and fentanyl dose after orthognathic cosmetic surgery were confirmed by Holm’s multiple-testing correction. These findings indicate that STRs in the CNR1 gene influence analgesia in the orofacial region.
Collapse
Affiliation(s)
- Shinya Kasai
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Daisuke Nishizawa
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Junko Hasegawa
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Ken-ichi Fukuda
- Department of Oral Health Science, Tokyo Dental College, Tokyo, Japan
| | - Tatsuya Ichinohe
- Department of Dental Anesthesiology, Tokyo Dental College, Tokyo, Japan
| | - Makoto Nagashima
- Department of Surgery, Toho University Sakura Medical Center, Sakura, Japan
| | - Masakazu Hayashida
- Department of Anesthesiology and Pain Medicine, Juntendo University School of Medicine, Tokyo, Japan
| | - Kazutaka Ikeda
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
- *Correspondence: Kazutaka Ikeda,
| |
Collapse
|
44
|
Analysis and comparison of the STR genotypes called with HipSTR, STRait Razor and toaSTR by using next generation sequencing data in a Brazilian population sample. Forensic Sci Int Genet 2022; 58:102676. [DOI: 10.1016/j.fsigen.2022.102676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 01/18/2022] [Accepted: 02/02/2022] [Indexed: 12/31/2022]
|
45
|
Chen J, Li F, Wang M, Li J, Marquez-Lago TT, Leier A, Revote J, Li S, Liu Q, Song J. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data. Front Big Data 2022; 4:727216. [PMID: 35118375 PMCID: PMC8805145 DOI: 10.3389/fdata.2021.727216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 12/13/2021] [Indexed: 11/22/2022] Open
Abstract
Background Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identify SSRs. However, the sequenced genome often misses several highly repetitive regions. Moreover, many non-model species have no entire genomes. With the recent advances of next-generation sequencing (NGS) techniques, large-scale sequence reads for any species can be rapidly generated using NGS. In this context, a number of methods have been proposed to identify thousands of SSR loci within large amounts of reads for non-model species. While the most commonly used NGS platforms (e.g., Illumina platform) on the market generally provide short paired-end reads, merging overlapping paired-end reads has become a common way prior to the identification of SSR loci. This has posed a big data analysis challenge for traditional stand-alone tools to merge short read pairs and identify SSRs from large-scale data. Results In this study, we present a new Hadoop-based software program, termed BigFiRSt, to address this problem using cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. BigFLASH and BigPERF address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can dramatically reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data. Conclusions The excellent performance of BigFiRSt mainly resorts to the Big Data Hadoop technology to merge read pairs and mine SSRs in parallel and distributed computing on clusters. We anticipate BigFiRSt will be a valuable tool in the coming biological Big Data era.
Collapse
Affiliation(s)
- Jinxiang Chen
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia
- Department of Microbiology and Immunity, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia
| | - Miao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Junlong Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Tatiana T. Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jerico Revote
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
| | - Shuqin Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
- Quanzhong Liu
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia
- *Correspondence: Jiangning Song
| |
Collapse
|
46
|
Han J, Munro JE, Kocoski A, Barry AE, Bahlo M. Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species. PLoS Genet 2022; 18:e1009604. [PMID: 35007277 PMCID: PMC8782505 DOI: 10.1371/journal.pgen.1009604] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 01/21/2022] [Accepted: 12/14/2021] [Indexed: 11/18/2022] Open
Abstract
Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).
Collapse
Affiliation(s)
- Jiru Han
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Jacob E. Munro
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Anthony Kocoski
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Alyssa E. Barry
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- Disease Elimination Program, Burnet Institute, Melbourne, Australia
- IMPACT Institute for Innovation in Mental and Physical Health and Clinical Translation, Deakin University, Geelong, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- * E-mail:
| |
Collapse
|
47
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
48
|
Maceda I, Lao O. Analysis of the Batch Effect Due to Sequencing Center in Population Statistics Quantifying Rare Events in the 1000 Genomes Project. Genes (Basel) 2021; 13:genes13010044. [PMID: 35052384 PMCID: PMC8775088 DOI: 10.3390/genes13010044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/19/2021] [Accepted: 12/21/2021] [Indexed: 12/01/2022] Open
Abstract
The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.
Collapse
Affiliation(s)
- Iago Maceda
- Population Genomics, CNAG-CRG, Centre for Genomic Regulation, 08028 Barcelona, Spain;
- Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Oscar Lao
- Population Genomics, CNAG-CRG, Centre for Genomic Regulation, 08028 Barcelona, Spain;
- Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
- Correspondence:
| |
Collapse
|
49
|
The molecular pathogenesis of repeat expansion diseases. Biochem Soc Trans 2021; 50:119-134. [PMID: 34940797 DOI: 10.1042/bst20200143] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/30/2021] [Accepted: 12/06/2021] [Indexed: 12/28/2022]
Abstract
Expanded short tandem repeats in the genome cause various monogenic diseases, particularly neurological disorders. Since the discovery of a CGG repeat expansion in the FMR1 gene in 1991, more than 40 repeat expansion diseases have been identified to date. In the coding repeat expansion diseases, in which the expanded repeat sequence is located in the coding regions of genes, the toxicity of repeat polypeptides, particularly misfolding and aggregation of proteins containing an expanded polyglutamine tract, have been the focus of investigation. On the other hand, in the non-coding repeat expansion diseases, in which the expanded repeat sequence is located in introns or untranslated regions, the toxicity of repeat RNAs has been the focus of investigation. Recently, these repeat RNAs were demonstrated to be translated into repeat polypeptides by the novel mechanism of repeat-associated non-AUG translation, which has extended the research direction of the pathological mechanisms of this disease entity to include polypeptide toxicity. Thus, a common pathogenesis has been suggested for both coding and non-coding repeat expansion diseases. In this review, we briefly outline the major pathogenic mechanisms of repeat expansion diseases, including a loss-of-function mechanism caused by repeat expansion, repeat RNA toxicity caused by RNA foci formation and protein sequestration, and toxicity by repeat polypeptides. We also discuss perturbation of the physiological liquid-liquid phase separation state caused by these repeat RNAs and repeat polypeptides, as well as potential therapeutic approaches against repeat expansion diseases.
Collapse
|
50
|
Dash HR, Vajpayee K, Srivastava A, Das S. Prevalence and characterisation of size and sequence-based microvariant alleles at nine autosomal STR markers in the Central Indian population. Ann Hum Biol 2021; 48:614-620. [PMID: 34818952 DOI: 10.1080/03014460.2021.2010804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
BACKGROUND Though microvariant alleles are widely reported in global populations, they are not well characterised to date. AIM To study the prevalence and characterisation of size and sequence-based microvariant alleles. SUBJECTS AND METHODS Next Generation Sequencing (NGS) was used to sequence microvariant alleles at nine autosomal STR markers in 138 samples. RESULTS After sequencing 31 STR markers using Precision ID GlobalFilerTM NGS STR panel v2, only nine markers, i.e. D12S391, D19S433, D1S1656, D21S11, D2S441, D7S820, FGA, Penta D, and TH01 showed the prevalence of microvariant alleles. Occurrence of microvariant alleles was positively correlated with Total Possible Alleles (p < 0.005), Power of Discrimination (p < 0.01), Polymorphic Information Content (p < 0.01), and Power of Exclusion (p < 0.05) and negatively correlated with the Matching Probability (p < 0.01). The average allele frequency of the microvariant alleles was found to be significantly less than the allele frequency value of the complete alleles (p = 0.88). Further, sequencing of these microvariant alleles reveals the deletion of nucleotides from the start, end, or middle of the repeat unit is responsible for the generation of a microvariant allele. CONCLUSIONS Prevalence of microvariant alleles is rare in nature and is limited to 9 STR loci out of 31 STR loci tested in the central Indian population. The occurrence of microvariant alleles in a locus increases its forensic and paternity application.
Collapse
Affiliation(s)
- Hirak Ranjan Dash
- DNA Fingerprinting Unit, Forensic Science Laboratory, Bhopal, Madhya Pradesh, India
| | - Kamayani Vajpayee
- DNA Fingerprinting Unit, Forensic Science Laboratory, Bhopal, Madhya Pradesh, India.,School of Arts and Sciences, Ahmedabad University, Ahmedabad, India
| | - Ankit Srivastava
- Institute of Forensic Science and Criminology, Bundelkhand University, Jhansi, UP, India
| | - Surajit Das
- Department of Life Science, National Institute of Technology, Rourkela, Odisha, India
| |
Collapse
|