1
|
Liu Q, Tian W. Association of human-specific expanded short tandem repeats with neuron-specific regulatory features. SCIENCE ADVANCES 2025; 11:eadp9707. [PMID: 40446031 PMCID: PMC12124357 DOI: 10.1126/sciadv.adp9707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 04/24/2025] [Indexed: 06/02/2025]
Abstract
Short tandem repeats (STRs), characterized by high-copy number mutations, represent one of the fastest-evolving genomic elements. However, human-specific expanded STRs (heSTRs) have lacked comprehensive genome-wide characterization. Leveraging 148 human and 26 nonhuman primate haploid genomes, we identified 8813 heSTRs with robust expansions in copy number distributions. Our analysis revealed notable associations between heSTRs and brain- and neuron-specific distal regulatory signals. Potential target genes regulated by heSTRs, identified by incorporating distal regulations, are enriched with neuronal development-related functions and disorders, displaying neuron-specific expression enhancement in humans. Moreover, heSTRs are associated with enhanced chromatin accessibility specifically in human neurons. In addition, heSTRs show substantial association with pathogenic STR loci exhibiting abnormal copy number variations, as reported by cohort studies on schizophrenia and autism. This study underscores the role of heSTRs in both human evolution and disorders, offering valuable insights for future research on STRs from an evolutionary perspective.
Collapse
Affiliation(s)
- Qiming Liu
- State Key Laboratory of Genetics and Development of Complex Phenotypes, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Weidong Tian
- State Key Laboratory of Genetics and Development of Complex Phenotypes, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
- Children’s Hospital of Fudan University, Shanghai, China
- Children’s Hospital of Shandong University, Jinan, China
| |
Collapse
|
2
|
Chen F, Zhang Y, Li W, Sedlazeck FJ, Shen L, Creighton CJ. Global DNA methylation differences involving germline structural variation impact gene expression in pediatric brain tumors. Nat Commun 2025; 16:4713. [PMID: 40399292 PMCID: PMC12095544 DOI: 10.1038/s41467-025-60110-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 05/13/2025] [Indexed: 05/23/2025] Open
Abstract
The extent of genetic variation and its influence on gene expression across multiple tissue and cellular contexts is still being characterized, with germline Structural Variants (SVs) being historically understudied. DNA methylation also represents a component of normal germline variation across individuals. Here, we combine germline SVs (by short-read sequencing) with tumor DNA methylation across 1292 pediatric brain tumor patients. For thousands of methylation probes for CpG Islands (CGIs) or enhancers, rare and common SV breakpoints upstream or downstream associate with differential methylation in tumors spanning various histologic types, a significant subset involving genes with SV-associated differential expression. Cancer predisposition genes involving SV-associated differential methylation and expression include MSH2, RSPA, and PALB2. SV breakpoints falling within CGIs or histone marks H3K36me3 or H3K9me3 associate with differential CGI methylation. Genes with SVs and CGI methylation associated with patient survival include POLD4. Our results capture a class of normal phenotypic variation having disease implications.
Collapse
Affiliation(s)
- Fengju Chen
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Yiqun Zhang
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA, 92697, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, 77005, USA
| | - Lanlan Shen
- USDA Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Chad J Creighton
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
3
|
Xia F, Verbiest MA, Lundström O, Sonay TB, Baudis M, Anisimova M. Multicancer analyses of short tandem repeat variations reveal shared gene regulatory mechanisms. Brief Bioinform 2025; 26:bbaf219. [PMID: 40401350 PMCID: PMC12096010 DOI: 10.1093/bib/bbaf219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 04/22/2025] [Accepted: 04/25/2025] [Indexed: 05/23/2025] Open
Abstract
Short tandem repeats (STRs) have been reported to influence gene expression across various human tissues. While STR variations are enriched in colorectal, stomach, and endometrial cancers, particularly in microsatellite instable tumors, their functional effects and regulatory mechanisms on gene expression remain poorly understood across these cancer types. Here, we leverage whole-exome sequencing and gene expression data to identify STRs for which repeat lengths are associated with the expression of nearby genes (eSTRs) in colorectal, stomach, and endometrial tumors. While most eSTRs are cancer-specific, shared eSTRs across multiple cancers exhibit consistent effects on gene expression. Notably, coding-region eSTRs identified in all three cancer types show positive correlations with nearby gene expression. We further validate the functional effects of eSTRs by demonstrating associations between somatic eSTR mutations and gene expression changes during the transition from normal to tumor tissues, suggesting their potential roles in tumorigenesis. Combined with DNA methylation data, we perform the first quantitative analysis of the interplay between STR variations and DNA methylation in tumors. We identify eSTRs where repeat lengths are associated with methylation levels of nearby CpG sites (meSTRs) and show that >70% of eSTRs are significantly linked to local DNA methylation. Importantly, the effects of meSTRs on DNA methylation remain consistent across cancer types. Overall, our findings enhance the understanding of how functional STR variations influence gene expression and DNA methylation. Our study highlights shared regulatory mechanisms of STRs across multiple cancers, offering a foundation for future research into their broader implications in tumor biology.
Collapse
Affiliation(s)
- Feifei Xia
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Schloss 4, 8820 Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Max Adriaan Verbiest
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Schloss 4, 8820 Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Oxana Lundström
- Department of Computer Science and Media Technology, Linnaeus University, Universitetsplatsen 1, 352 52 Växjö, Sweden
| | - Tugce Bilgin Sonay
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Schloss 4, 8820 Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Schloss 4, 8820 Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Groza C, Ge B, Cheung WA, Pastinen T, Bourque G. Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA. Genome Res 2025; 35:644-652. [PMID: 40113263 PMCID: PMC12047246 DOI: 10.1101/gr.279240.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 02/11/2025] [Indexed: 03/22/2025]
Abstract
Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 106 novel SV-CpGs, followed by centromeric satellites (6.57 × 105), simple repeats (5.40 × 105), Alu elements (5.07 × 105), satellites (2.17 × 105), LINE-1s (1.83 × 105), and SVA (SINE-VNTR-Alu) elements (1.50 × 105). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.
Collapse
Affiliation(s)
- Cristian Groza
- Université de Montréal, Montréal Heart Institute, Montréal, Québec H1T 1C8, Canada
| | - Bing Ge
- McGill University, McGill University and Genome Quebec Innovation Centre, Montréal, Québec H3A 2T8, Canada
| | - Warren A Cheung
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA
| | - Tomi Pastinen
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA;
| | - Guillaume Bourque
- McGill University, Human Genetics, Montréal, Québec H3A 0C7, Canada;
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec H3A 2R7, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec H3A 0G1, Canada
| |
Collapse
|
5
|
Liu Y, Xia K. Aberrant Short Tandem Repeats: Pathogenicity, Mechanisms, Detection, and Roles in Neuropsychiatric Disorders. Genes (Basel) 2025; 16:406. [PMID: 40282366 PMCID: PMC12026680 DOI: 10.3390/genes16040406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/17/2025] [Accepted: 03/19/2025] [Indexed: 04/29/2025] Open
Abstract
Short tandem repeat (STR) sequences are highly variable DNA segments that significantly contribute to human neurodegenerative disorders, highlighting their crucial role in neuropsychiatric conditions. This article examines the pathogenicity of abnormal STRs and classifies tandem repeat expansion disorders(TREDs), emphasizing their genetic characteristics, mechanisms of action, detection methods, and associated animal models. STR expansions exhibit complex genetic patterns that affect the age of onset and symptom severity. These expansions disrupt gene function through mechanisms such as gene silencing, toxic gain-of-function mutations leading to RNA and protein toxicity, and the generation of toxic peptides via repeat-associated non-AUG (RAN) translation. Advances in sequencing technologies-from traditional PCR and Southern blotting to next-generation and long-read sequencing-have enhanced the accuracy of STR variation detection. Research utilizing these technologies has linked STR expansions to a range of neuropsychiatric disorders, including autism spectrum disorders and schizophrenia, highlighting their contribution to disease risk and phenotypic expression through effects on genes involved in neurodevelopment, synaptic function, and neuronal signaling. Therefore, further investigation is essential to elucidate the intricate interplay between STRs and neuropsychiatric diseases, paving the way for improved diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Yuzhong Liu
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| | - Kun Xia
- Institute of Cytology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421001, China;
- MOE Key Lab of Rare Pediatric Diseases, School of Basic Medicine, Hengyang Medical College, University of South China, Hengyang 421001, China
| |
Collapse
|
6
|
Tan X, Zeng W, Yang Y, Lin Z, Li F, Liu J, Chen S, Liu YG, Xie W, Xie X. Genome-wide profiling of polymorphic short tandem repeats and their influence on gene expression and trait variation in diverse rice populations. J Genet Genomics 2025:S1673-8527(25)00078-5. [PMID: 40089018 DOI: 10.1016/j.jgg.2025.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/10/2025] [Accepted: 03/10/2025] [Indexed: 03/17/2025]
Abstract
Short tandem repeats (STRs) modulate gene expression and contribute to trait variation. However, a systematic evaluation of the genomic characteristics of STRs has not been conducted, and their influence on gene expression in rice remains unclear. Here, we construct a map of 137,629 polymorphic STRs in the rice (Oryza sativa L.) genome using a population-scale resequencing dataset. A genome-wide survey encompassing 4726 accessions shows that the occurrence frequency, mutational patterns, chromosomal distribution, and functional properties of STRs are correlated with the sequences and lengths of repeat motifs. Leveraging a transcriptome dataset from 127 rice accessions, we identify 44,672 expression STRs (eSTRs) by modeling gene expression in response to the length variation of STRs. These eSTRs are notably enriched in the regulatory regions of genes with active transcriptional signatures. Population analysis identifies numerous STRs that have undergone genetic divergence among different rice groups and 1726 tagged STRs that may be associated with agronomic traits. By editing the (ACT)7 STR in OsFD1 promoter, we further experimentally validate its role in regulating gene expression and phenotype. Our study highlights the contribution of STRs to transcriptional regulation in plants and establishes the foundation for their potential use as alternative targets for genetic improvement.
Collapse
Affiliation(s)
- Xiyu Tan
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Wanyong Zeng
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Yujian Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Zhansheng Lin
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Fuquan Li
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Jianhong Liu
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Shaotong Chen
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Yao-Guang Liu
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China.
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| | - Xianrong Xie
- Guangdong Basic Research Center of Excellence for Precise Breeding of Future Crops, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China.
| |
Collapse
|
7
|
Arthur TD, Nguyen JP, Henson BA, D'Antonio-Chronowska A, Jaureguy J, Silva N, Panopoulos AD, Izpisua Belmonte JC, D'Antonio M, McVicker G, Frazer KA. Multiomic QTL mapping reveals phenotypic complexity of GWAS loci and prioritizes putative causal variants. CELL GENOMICS 2025; 5:100775. [PMID: 39986281 PMCID: PMC11960542 DOI: 10.1016/j.xgen.2025.100775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 10/18/2024] [Accepted: 01/24/2025] [Indexed: 02/24/2025]
Abstract
Most GWAS loci are presumed to affect gene regulation; however, only ∼43% colocalize with expression quantitative trait loci (eQTLs). To address this colocalization gap, we map eQTLs, chromatin accessibility QTLs (caQTLs), and histone acetylation QTLs (haQTLs) using molecular samples from three early developmental-like tissues. Through colocalization, we annotate 10.4% (n = 540) of GWAS loci in 15 traits by QTL phenotype, temporal specificity, and complexity. We show that integration of chromatin QTLs results in a 2.3-fold higher annotation rate of GWAS loci because they capture distal GWAS loci missed by eQTLs, and that 5.4% (n = 13) of GWAS colocalizing eQTLs are early developmental specific. Finally, we utilize the iPSCORE multiomic QTLs to prioritize putative causal variants overlapping transcription factor motifs to elucidate the potential genetic underpinnings of 296 GWAS-QTL colocalizations.
Collapse
Affiliation(s)
- Timothy D Arthur
- Biomedical Sciences Program, University of California, San Diego, La Jolla, CA 92093, USA; Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jennifer P Nguyen
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Benjamin A Henson
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Jeffrey Jaureguy
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA; Integrative Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Nayara Silva
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Athanasia D Panopoulos
- Board of Governors Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA; Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | | | - Matteo D'Antonio
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Graham McVicker
- Integrative Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Kelly A Frazer
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
8
|
Doss RM, Lopez-Ignacio S, Dischler A, Hiatt L, Dashnow H, Breuss MW, Dias CM. Mosaicism in Short Tandem Repeat Disorders: A Clinical Perspective. Genes (Basel) 2025; 16:216. [PMID: 40004546 PMCID: PMC11855715 DOI: 10.3390/genes16020216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/06/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
Fragile X, Huntington disease, and myotonic dystrophy type 1 are prototypical examples of human disorders caused by short tandem repeat variation, repetitive nucleotide stretches that are highly mutable both in the germline and somatic tissue. As short tandem repeats are unstable, they can expand, contract, and acquire and lose epigenetic marks in somatic tissue. This means within an individual, the genotype and epigenetic state at these loci can vary considerably from cell to cell. This somatic mosaicism may play a key role in clinical pathogenesis, and yet, our understanding of mosaicism in driving clinical phenotypes in short tandem repeat disorders is only just emerging. This review focuses on these three relatively well-studied examples where, given the advent of new technologies and bioinformatic approaches, a critical role for mosaicism is coming into focus both with respect to cellular physiology and clinical phenotypes.
Collapse
Affiliation(s)
- Rose M. Doss
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Susana Lopez-Ignacio
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Anna Dischler
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Laurel Hiatt
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
| | - Harriet Dashnow
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Martin W. Breuss
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Caroline M. Dias
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Section of Developmental Pediatrics, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
9
|
Cui Y, Arnold FJ, Li JS, Wu J, Wang D, Philippe J, Colwin MR, Michels S, Chen C, Sallam T, Thompson LM, La Spada AR, Li W. Multi-omic quantitative trait loci link tandem repeat size variation to gene regulation in human brain. Nat Genet 2025; 57:369-378. [PMID: 39809899 DOI: 10.1038/s41588-024-02057-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 12/10/2024] [Indexed: 01/16/2025]
Abstract
Tandem repeat (TR) size variation is implicated in ~50 neurological disorders, yet its impact on gene regulation in the human brain remains largely unknown. In the present study, we quantified the impact of TR size variation on brain gene regulation across distinct molecular phenotypes, based on 4,412 multi-omics samples from 1,597 donors, including 1,586 newly sequenced ones. We identified ~2.2 million TR molecular quantitative trait loci (TR-xQTLs), linking ~139,000 unique TRs to nearby molecular phenotypes, including many known disease-risk TRs, such as the G2C4 expansion in C9orf72 associated with amyotrophic lateral sclerosis. Fine-mapping revealed ~18,700 TRs as potential causal variants. Our in vitro experiments further confirmed the causal and independent regulatory effects of three TRs. Additional colocalization analysis indicated the potential causal role of TR variation in brain-related phenotypes, highlighted by a 3'-UTR TR in NUDT14 linked to cortical surface area and a TG repeat in PLEKHA1, associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Ya Cui
- Division of Computational Biomedicine, Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA.
| | - Frederick J Arnold
- Departments of Pathology & Laboratory Medicine, Neurology, Biological Chemistry, and Neurobiology & Behavior, University of California, Irvine, Irvine, CA, USA
| | - Jason Sheng Li
- Division of Computational Biomedicine, Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Jie Wu
- Departments of Psychiatry and Human Behavior, Neurobiology and Behavior, and Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Dan Wang
- Division of Cardiology, Department of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Julien Philippe
- Departments of Pathology & Laboratory Medicine, Neurology, Biological Chemistry, and Neurobiology & Behavior, University of California, Irvine, Irvine, CA, USA
| | - Michael R Colwin
- Departments of Pathology & Laboratory Medicine, Neurology, Biological Chemistry, and Neurobiology & Behavior, University of California, Irvine, Irvine, CA, USA
| | - Sebastian Michels
- Departments of Pathology & Laboratory Medicine, Neurology, Biological Chemistry, and Neurobiology & Behavior, University of California, Irvine, Irvine, CA, USA
- Department of Neurology, University of Ulm, Oberer Eselsberg, Ulm, Germany
| | - Chaorong Chen
- Division of Computational Biomedicine, Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Tamer Sallam
- Division of Cardiology, Department of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Leslie M Thompson
- Departments of Psychiatry and Human Behavior, Neurobiology and Behavior, and Biological Chemistry, University of California, Irvine, Irvine, CA, USA.
| | - Albert R La Spada
- Departments of Pathology & Laboratory Medicine, Neurology, Biological Chemistry, and Neurobiology & Behavior, University of California, Irvine, Irvine, CA, USA.
- UCI Center for Neurotherapeutics, University of California, Irvine, Irvine, CA, USA.
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
10
|
Guo MH, Lee WP, Vardarajan B, Schellenberg GD, Phillips-Cremins JE. Polygenic burden of short tandem repeat expansions promotes risk for Alzheimer's disease. Nat Commun 2025; 16:1126. [PMID: 39875385 PMCID: PMC11775329 DOI: 10.1038/s41467-025-56400-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/17/2025] [Indexed: 01/30/2025] Open
Abstract
Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed. As pathogenic expansions of STR cause over 30 neurologic diseases, it is important to ascertain whether STRs may also be implicated in AD risk. Here, we genotype 312,731 polymorphic STR tracts genome-wide using PCR-free whole genome sequencing data from 2981 individuals (1489 AD case and 1492 control individuals). We implement an approach to identify STR expansions as STRs with tract lengths that are outliers from the population. We then test for differences in aggregate burden of expansions in case versus control individuals. AD patients harbor a 1.19-fold increase of STR expansions compared to healthy elderly controls (p = 8.27×10-3, two-sided Mann-Whitney test). Individuals carrying >30 STR expansions have a 3.69-fold higher odds of having AD and have more severe AD neuropathology. AD STR expansions are highly enriched within active promoters in post-mortem hippocampal brain tissues and particularly within SINE-VNTR-Alu (SVA) retrotransposons. Together, these results demonstrate that expanded STRs within active promoter regions of the genome associate with risk of AD.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Badri Vardarajan
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Jennifer E Phillips-Cremins
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
- Department of Bioengineering, University of Pennsylvania, Philadelphia, USA.
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
11
|
Collins RL, Talkowski ME. Diversity and consequences of structural variation in the human genome. Nat Rev Genet 2025:10.1038/s41576-024-00808-9. [PMID: 39838028 DOI: 10.1038/s41576-024-00808-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2024] [Indexed: 01/23/2025]
Abstract
The biomedical community is increasingly invested in capturing all genetic variants across human genomes, interpreting their functional consequences and translating these findings to the clinic. A crucial component of this endeavour is the discovery and characterization of structural variants (SVs), which are ubiquitous in the human population, heterogeneous in their mutational processes, key substrates for evolution and adaptation, and profound drivers of human disease. The recent emergence of new technologies and the remarkable scale of sequence-based population studies have begun to crystalize our understanding of SVs as a mutational class and their widespread influence across phenotypes. In this Review, we summarize recent discoveries and new insights into SVs in the human genome in terms of their mutational patterns, population genetics, functional consequences, and impact on human traits and disease. We conclude by outlining three frontiers to be explored by the field over the next decade.
Collapse
Affiliation(s)
- Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
12
|
Berthold N, Gaudieri S, Hood S, Tschochner M, Miller AL, Jordan J, Thornton LM, Bulik CM, Akkari PA, Kennedy MA. Nanopore sequencing as a novel method of characterising anorexia nervosa risk loci. BMC Genomics 2024; 25:1262. [PMID: 39741260 DOI: 10.1186/s12864-024-11172-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 12/19/2024] [Indexed: 01/02/2025] Open
Abstract
BACKGROUND Anorexia nervosa (AN) is a polygenic, severe metabopsychiatric disorder with poorly understood aetiology. Eight significant loci have been identified by genome-wide association studies (GWAS) and single nucleotide polymorphism (SNP)-based heritability was estimated to be ~ 11-17, yet causal variants remain elusive. It is therefore important to define the full spectrum of genetic variants in the wider regions surrounding these significantly associated loci. The hypothesis we evaluate here is that unrecognised or relatively unexplored variants in these regions exist and are promising targets for future functional analyses. To test this hypothesis, we implemented a novel approach with targeted nanopore sequencing (Oxford Nanopore Technologies) for 200 kb regions centred on each of the eight AN-associated loci in 10 AN case samples. Our bioinformatics pipeline entailed base-calling and alignment with Dorado and minimap2 software, followed by variant calling with four separate tools, Sniffles2, Clair3, Straglr, and NanoVar. We then leveraged publicly available databases to characterise these loci in putative functional context and prioritise a subset of potentially relevant variants. RESULTS Targeted nanopore sequencing effectively enriched the target regions (average coverage 14.64x). To test our hypothesis, we curated a list of 20 prioritised variants in non-coding regions, poorly represented in the current human reference genome but that may have functional consequences in AN pathology. Notably, we identified a polymorphic SINE-VNTR-Alu like sub-family D element (SVA-D), intergenic with IP6K2 and PRKAR2A, and a poly-T short tandem repeat (STR) in the 3'UTR of FOXP1. CONCLUSIONS Our results highlight the potential of targeted nanopore sequencing for characterising poorly resolved or complex variation, which may be initially obscured in risk-associated regions detected by GWAS. Some of the variants identified in this way, such as the polymorphic SVA-D and poly-T STR, could contribute to mechanisms of phenotypic risk, through regulation of several neighbouring genes implicated in AN biology, and affect post-transcriptional processing of FOXP1, respectively. This exploratory investigation was not powered to detect functional effects, however, the variants we observed using this method are poorly represented in the current human reference genome and accompanying databases, and further examination of these may provide new opportunities for improved understanding of genetic risk mechanisms of AN.
Collapse
Affiliation(s)
- Natasha Berthold
- University of Western Australia, Crawley, WA, Australia.
- Perron Research Institute, Nedlands, WA, Australia.
- Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand.
| | - Silvana Gaudieri
- University of Western Australia, Crawley, WA, Australia
- Murdoch University, Murdoch, WA, Australia
- Vanderbilt University Medical Centre, Nashville, TN, USA
| | - Sean Hood
- University of Western Australia, Crawley, WA, Australia
| | | | - Allison L Miller
- Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand
| | - Jennifer Jordan
- Department of Psychological Medicine, University of Otago Christchurch, Christchurch, New Zealand
| | - Laura M Thornton
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Cynthia M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Patrick Anthony Akkari
- University of Western Australia, Crawley, WA, Australia
- Perron Research Institute, Nedlands, WA, Australia
- Murdoch University, Murdoch, WA, Australia
- Duke University, Durham, NC, USA
| | - Martin A Kennedy
- Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand
| |
Collapse
|
13
|
Reinar WB, Krabberød AK, Lalun VO, Butenko MA, Jakobsen KS. Short tandem repeats delineate gene bodies across eukaryotes. Nat Commun 2024; 15:10902. [PMID: 39738068 PMCID: PMC11686069 DOI: 10.1038/s41467-024-55276-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 12/05/2024] [Indexed: 01/01/2025] Open
Abstract
Short tandem repeats (STRs) have emerged as important and hypermutable sites where genetic variation correlates with gene expression in plant and animal systems. Recently, it has been shown that a broad range of transcription factors (TFs) are affected by STRs near or in the DNA target binding site. Despite this, the distribution of STR motif repetitiveness in eukaryote genomes is still largely unknown. Here, we identify monomer and dimer STR motif repetitiveness in 5.1 billion 10-bp windows upstream of translation starts and downstream of translation stops in 25 million genes spanning 1270 species across the eukaryotic Tree of Life. We report that all surveyed genomes have gene-proximal shifts in motif repetitiveness. Within genomes, variation in gene-proximal repetitiveness landscapes correlated to the function of genes; genes with housekeeping functions were depleted in upstream and downstream repetitiveness. Furthermore, the repetitiveness landscapes correlated with TF binding sites, indicating that gene function has evolved in conjunction with cis-regulatory STRs and TFs that recognize repetitive sites. These results suggest that the hypermutability inherent to STRs is canalized along the genome sequence and contributes to regulatory and eco-evolutionary dynamics in all eukaryotes.
Collapse
Affiliation(s)
- William B Reinar
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway.
| | - Anders K Krabberød
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Vilde O Lalun
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Melinka A Butenko
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
14
|
Dawood M, Heavner B, Wheeler MM, Ungar RA, LoTempio J, Wiel L, Berger S, Bernstein JA, Chong JX, Délot EC, Eichler EE, Gibbs RA, Lupski JR, Shojaie A, Talkowski ME, Wagner AH, Wei CL, Wellington C, Wheeler MT, Carvalho CMB, Gifford CA, May S, Miller DE, Rehm HL, Sedlazeck FJ, Vilain E, O'Donnell-Luria A, Posey JE, Chadwick LH, Bamshad MJ, Montgomery SB. GREGoR: Accelerating Genomics for Rare Diseases. ARXIV 2024:arXiv:2412.14338v1. [PMID: 39764392 PMCID: PMC11702807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2025]
Abstract
Rare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was initiated to study thousands of challenging rare disease cases and families and apply, standardize, and evaluate emerging genomics technologies and analytics to accelerate their adoption in clinical practice. Further, all data generated, currently representing ~7500 individuals from ~3000 families, is rapidly made available to researchers worldwide via the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) to catalyze global efforts to develop approaches for genetic diagnoses in rare diseases (https://gregorconsortium.org/data). The majority of these families have undergone prior clinical genetic testing but remained unsolved, with most being exome-negative. Here, we describe the collaborative research framework, datasets, and discoveries comprising GREGoR that will provide foundational resources and substrates for the future of rare disease genomics.
Collapse
Affiliation(s)
- Moez Dawood
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, USA
| | - Ben Heavner
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Marsha M Wheeler
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Rachel A Ungar
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
- Stanford Center for Biomedical Ethics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, USA
| | - Laurens Wiel
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
- Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - Seth Berger
- Division of Genetics and Metabolism, Children's National Rare Disease Institute, Washington, DC, USA
- Center for Genetic Medicine Research, Children's National Rare Disease Institute, Washington, DC, USA
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, USA
| | - Jonathan A Bernstein
- Department of Pediatrics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Jessica X Chong
- Department of Pediatrics, Dvision of Genetic Medicine, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| | - Emmanuèle C Délot
- Institute for Clinical and Translational Science, University of California, Irvine, CA, USA
| | - Evan E Eichler
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
| | - Alex H Wagner
- Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Chia-Lin Wei
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christopher Wellington
- Office of Genomic Data Science, National Human Genome Research Institute, Bethesda, MD, USA
| | - Matthew T Wheeler
- Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Casey A Gifford
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Pediatrics, School of Medicine, Stanford University, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
- Institute for Stem Cell Biology and Regenerative Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - Susanne May
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Danny E Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, USA
| | - Anne O'Donnell-Luria
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Lisa H Chadwick
- Division of Genome Sciences, National Human Genome Research Institute, Bethesda, MD, USA
| | - Michael J Bamshad
- Department of Pediatrics, Dvision of Genetic Medicine, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA, USA
| | - Stephen B Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
15
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
16
|
Uguen K, Michaud JL, Génin E. Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. Eur J Hum Genet 2024; 32:1037-1044. [PMID: 38982300 PMCID: PMC11369099 DOI: 10.1038/s41431-024-01666-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
In this study, we explore the landscape of short tandem repeats (STRs) within the human genome through the lens of evolving technologies to detect genomic variations. STRs, which encompass approximately 3% of our genomic DNA, are crucial for understanding human genetic diversity, disease mechanisms, and evolutionary biology. The advent of high-throughput sequencing methods has revolutionized our ability to accurately map and analyze STRs, highlighting their significance in genetic disorders, forensic science, and population genetics. We review the current available methodologies for STR analysis, the challenges in interpreting STR variations across different populations, and the implications of STRs in medical genetics. Our findings underscore the urgent need for comprehensive STR databases that reflect the genetic diversity of global populations, facilitating the interpretation of STR data in clinical diagnostics, genetic research, and forensic applications. This work sets the stage for future studies aimed at harnessing STR variations to elucidate complex genetic traits and diseases, reinforcing the importance of integrating STRs into genetic research and clinical practice.
Collapse
Affiliation(s)
- Kevin Uguen
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France.
- Service de Génétique Médicale et Biologie de la Reproduction, CHU de Brest, Brest, France.
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada.
| | - Jacques L Michaud
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
17
|
Lamkin M, Gymrek M. The emerging role of tandem repeats in complex traits. Nat Rev Genet 2024; 25:452-453. [PMID: 38714860 DOI: 10.1038/s41576-024-00736-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Affiliation(s)
- Michael Lamkin
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
18
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
19
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. A comprehensive tandem repeat catalog of the human genome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309173. [PMID: 38947075 PMCID: PMC11213036 DOI: 10.1101/2024.06.19.24309173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- BC Children's Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
20
|
Maciocha F, Suchanecka A, Chmielowiec K, Chmielowiec J, Ciechanowicz A, Boroń A. Correlations of the CNR1 Gene with Personality Traits in Women with Alcohol Use Disorder. Int J Mol Sci 2024; 25:5174. [PMID: 38791212 PMCID: PMC11121729 DOI: 10.3390/ijms25105174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/02/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
Alcohol use disorder (AUD) is a significant issue affecting women, with severe consequences for society, the economy, and most importantly, health. Both personality and alcohol use disorders are phenotypically very complex, and elucidating their shared heritability is a challenge for medical genetics. Therefore, our study investigated the correlations between the microsatellite polymorphism (AAT)n of the Cannabinoid Receptor 1 (CNR1) gene and personality traits in women with AUD. The study group included 187 female subjects. Of these, 93 were diagnosed with alcohol use disorder, and 94 were controls. Repeat length polymorphism of microsatellite regions (AAT)n in the CNR1 gene was identified with PCR. All participants were assessed with the Mini-International Neuropsychiatric Interview and completed the NEO Five-Factor and State-Trait Anxiety Inventories. In the group of AUD subjects, significantly fewer (AAT)n repeats were present when compared with controls (p = 0.0380). While comparing the alcohol use disorder subjects (AUD) and the controls, we observed significantly higher scores on the STAI trait (p < 0.00001) and state scales (p = 0.0001) and on the NEO Five-Factor Inventory Neuroticism (p < 0.00001) and Openness (p = 0.0237; insignificant after Bonferroni correction) scales. Significantly lower results were obtained on the NEO-FFI Extraversion (p = 0.00003), Agreeability (p < 0.00001) and Conscientiousness (p < 0.00001) scales by the AUD subjects when compared to controls. There was no statistically significant Pearson's linear correlation between the number of (AAT)n repeats in the CNR1 gene and the STAI and NEO Five-Factor Inventory scores in the group of AUD subjects. In contrast, Pearson's linear correlation analysis in controls showed a positive correlation between the number of the (AAT)n repeats and the STAI state scale (r = 0.184; p = 0.011; insignificant after Bonferroni correction) and a negative correlation with the NEO-FFI Openness scale (r = -0.241; p = 0.001). Interestingly, our study provided data on two separate complex issues, i.e., (1) the association of (AAT)n CNR1 repeats with the AUD in females; (2) the correlation of (AAT)n CNR1 repeats with anxiety as a state and Openness in non-alcohol dependent subjects. In conclusion, our study provided a plethora of valuable data for improving our understanding of alcohol use disorder and anxiety.
Collapse
Affiliation(s)
- Filip Maciocha
- Department of Clinical and Molecular Biochemistry, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland; (F.M.); (A.C.)
| | - Aleksandra Suchanecka
- Independent Laboratory of Behavioral Genetics and Epigenetics, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland;
| | - Krzysztof Chmielowiec
- Department of Hygiene and Epidemiology, Collegium Medicum, University of Zielona Góra, 28 Zyty St., 65-046 Zielona Góra, Poland; (K.C.); (J.C.)
| | - Jolanta Chmielowiec
- Department of Hygiene and Epidemiology, Collegium Medicum, University of Zielona Góra, 28 Zyty St., 65-046 Zielona Góra, Poland; (K.C.); (J.C.)
| | - Andrzej Ciechanowicz
- Department of Clinical and Molecular Biochemistry, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland; (F.M.); (A.C.)
| | - Agnieszka Boroń
- Department of Clinical and Molecular Biochemistry, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland; (F.M.); (A.C.)
| |
Collapse
|
21
|
Tajeddin N, Arabfard M, Alizadeh S, Salesi M, Khamse S, Delbari A, Ohadi M. Novel islands of GGC and GCC repeats coincide with human evolution. Gene 2024; 902:148194. [PMID: 38262548 DOI: 10.1016/j.gene.2024.148194] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/29/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024]
Abstract
BACKGROUND Because of high mutation rate, overrepresentation in genic regions, and link with various neurological, neurodegenerative, and movement disorders, GGC and GCC short tandem repeats (STRs) are prone to natural selection. Among a number of lacking data, the 3-repeats of these STRs remain widely unexplored. RESULTS In a genome-wide search in human, here we mapped GGC and GCC STRs of ≥3-repeats, and found novel islands of up to 45 of those STRs, populating spans of 1 to 2 kb of genomic DNA. RGPD4 and NOC4L harbored the densest (GGC)3 (probability 3.09061E-71) and (GCC)3 (probability 1.72376E-61) islands, respectively, and were human-specific. We also found prime instances of directional incremented density of STRs at specific loci in human versus other species, including the FOXK2 and SKI GGC islands. The genes containing those islands significantly diverged in expression in human versus other species, and the proteins encoded by those genes interact closely in a physical interaction network, consequence of which may be human-specific characteristics such as higher order brain functions. CONCLUSION We report novel islands of GGC and GCC STRs of evolutionary relevance to human. The density, and in some instances, periodicity of these islands support them as a novel genomic entity, which need to be further explored in evolutionary, mechanistic, and functional platforms.
Collapse
Affiliation(s)
- N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Salesi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
22
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Jaureguy J, Silva N, Henson B, Panopoulos AD, Belmonte JCI, D'Antonio M, McVicker G, Frazer KA. Multi-omic QTL mapping in early developmental tissues reveals phenotypic and temporal complexity of regulatory variants underlying GWAS loci. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.10.588874. [PMID: 38645112 PMCID: PMC11030419 DOI: 10.1101/2024.04.10.588874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Most GWAS loci are presumed to affect gene regulation, however, only ∼43% colocalize with expression quantitative trait loci (eQTLs). To address this colocalization gap, we identify eQTLs, chromatin accessibility QTLs (caQTLs), and histone acetylation QTLs (haQTLs) using molecular samples from three early developmental (EDev) tissues. Through colocalization, we annotate 586 GWAS loci for 17 traits by QTL complexity, QTL phenotype, and QTL temporal specificity. We show that GWAS loci are highly enriched for colocalization with complex QTL modules that affect multiple elements (genes and/or peaks). We also demonstrate that caQTLs and haQTLs capture regulatory variations not associated with eQTLs and explain ∼49% of the functionally annotated GWAS loci. Additionally, we show that EDev-unique QTLs are strongly depleted for colocalizing with GWAS loci. By conducting one of the largest multi-omic QTL studies to date, we demonstrate that many GWAS loci exhibit phenotypic complexity and therefore, are missed by traditional eQTL analyses.
Collapse
|
23
|
Chen F, Zhang Y, Sedlazeck FJ, Creighton CJ. Germline structural variation globally impacts the cancer transcriptome including disease-relevant genes. Cell Rep Med 2024; 5:101446. [PMID: 38442712 PMCID: PMC10983041 DOI: 10.1016/j.xcrm.2024.101446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/01/2024] [Accepted: 02/06/2024] [Indexed: 03/07/2024]
Abstract
Germline variation and somatic alterations contribute to the molecular profile of cancers. We combine RNA with whole genome sequencing across 1,218 cancer patients to determine the extent germline structural variants (SVs) impact expression of nearby genes. For hundreds of genes, recurrent and common germline SV breakpoints within 100 kb associate with increased or decreased expression in tumors spanning various tissues of origin. A significant fraction of germline SV expression associations involves duplication of intergenic enhancers or 3' UTR disruption. Genes altered by both somatic and germline SVs include ATRX and CEBPA. Genes essential in cancer cell lines include BARD1 and IRS2. Genes with both expression and germline SV breakpoint patterns associated with patient survival include GCLM. Our results capture a class of phenotypic variation at work in the disease setting, including genes with cancer roles. Specific germline SVs represent potential cancer risk variants for genetic testing, including those involving genes with targeting implications.
Collapse
Affiliation(s)
- Fengju Chen
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yiqun Zhang
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Chad J Creighton
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
24
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Greenwald WWY, D'Antonio M, Pera MF, Frazer KA. Complex regulatory networks influence pluripotent cell state transitions in human iPSCs. Nat Commun 2024; 15:1664. [PMID: 38395976 PMCID: PMC10891157 DOI: 10.1038/s41467-024-45506-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 01/26/2024] [Indexed: 02/25/2024] Open
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discover 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which are highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlie the coordinated expression of genes in the GNMs. Epigenetic analyses reveal that regulatory networks underlying self-renewal and pluripotency are more complex than previously realized. Genetic analyses identify thousands of regulatory variants that overlapped predicted transcription factor binding sites and are associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network are significantly enriched for regulatory variants with large effects, suggesting that they play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work bins tens of thousands of regulatory elements in hiPSCs into discrete regulatory networks, shows that pluripotency and self-renewal processes have a surprising level of regulatory complexity, and suggests that genetic factors may contribute to cell state transitions in human iPSC lines.
Collapse
Affiliation(s)
- Timothy D Arthur
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Jennifer P Nguyen
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | | | - Hiroko Matsui
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Nayara S Silva
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Isaac N Joshua
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - André D Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Clinical and Toxicological Analysis, Federal University of Rio Grande do Norte, Natal, Brazil
| | - William W Young Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Matteo D'Antonio
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | | | - Kelly A Frazer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
25
|
Arabfard M, Tajeddin N, Alizadeh S, Salesi M, Bayat H, Khorram Khorshid HR, Khamse S, Delbari A, Ohadi M. Dyads of GGC and GCC form hotspot colonies that coincide with the evolution of human and other great apes. BMC Genom Data 2024; 25:21. [PMID: 38383300 PMCID: PMC10880355 DOI: 10.1186/s12863-024-01207-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/11/2024] [Indexed: 02/23/2024] Open
Abstract
BACKGROUND GGC and GCC short tandem repeats (STRs) are of various evolutionary, biological, and pathological implications. However, the fundamental two-repeats (dyads) of these STRs are widely unexplored. RESULTS On a genome-wide scale, we mapped (GGC)2 and (GCC)2 dyads in human, and found monumental colonies (distance between each dyad < 500 bp) of extraordinary density, and in some instances periodicity. The largest (GCC)2 and (GGC)2 colonies were intergenic, homogeneous, and human-specific, consisting of 219 (GCC)2 on chromosome 2 (probability < 1.545E-219) and 70 (GGC)2 on chromosome 9 (probability = 1.809E-148). We also found that several colonies were shared in other great apes, and directionally increased in density and complexity in human, such as a colony of 99 (GCC)2 on chromosome 20, that specifically expanded in great apes, and reached maximum complexity in human (probability 1.545E-220). Numerous other colonies of evolutionary relevance in human were detected in other largely overlooked regions of the genome, such as chromosome Y and pseudogenes. Several of the genes containing or nearest to those colonies were divergently expressed in human. CONCLUSION In conclusion, (GCC)2 and (GGC)2 form unprecedented genomic colonies that coincide with the evolution of human and other great apes. The extent of the genomic rearrangements leading to those colonies support overlooked recombination hotspots, shared across great apes. The identified colonies deserve to be studied in mechanistic, evolutionary, and functional platforms.
Collapse
Affiliation(s)
- M Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Salesi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
- Research Center for Prevention of Oral and Dental Diseases, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - H Bayat
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
26
|
Bayat H, Mirahmadi M, Azarshin Z, Ohadi H, Delbari A, Ohadi M. CRISPR/Cas9-mediated deletion of a GA-repeat in human GPM6B leads to disruption of neural cell differentiation from NT2 cells. Sci Rep 2024; 14:2136. [PMID: 38273037 PMCID: PMC10810867 DOI: 10.1038/s41598-024-52675-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 01/22/2024] [Indexed: 01/27/2024] Open
Abstract
The human neuron-specific gene, GPM6B (Glycoprotein membrane 6B), is considered a key gene in neural cell functionality. This gene contains an exceptionally long and strictly monomorphic short tandem repeat (STR) of 9-repeats, (GA)9. STRs in regulatory regions, may impact on the expression of nearby genes. We used CRISPR-based tool to delete this GA-repeat in NT2 cells, and analyzed the consequence of this deletion on GPM6B expression. Subsequently, the edited cells were induced to differentiate into neural cells, using retinoic acid (RA) treatment. Deletion of the GA-repeat significantly decreased the expression of GPM6B at the RNA (p < 0.05) and protein (40%) levels. Compared to the control cells, the edited cells showed dramatic decrease of the astrocyte and neural cell markers, including GFAP (0.77-fold), TUBB3 (0.57-fold), and MAP2 (0.2-fold). Subsequent sorting of the edited cells showed an increased number of NES (p < 0.01), but a decreased number of GFAP (p < 0.001), TUBB3 (p < 0.05), and MAP2 (p < 0.01), compared to the control cells. In conclusion, CRISPR/Cas9-mediated deletion of a GA-repeat in human GPM6B, led to decreased expression of this gene, which in turn, disrupted differentiation of NT2 cells into neural cells.
Collapse
Affiliation(s)
- Hadi Bayat
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Postal Code: 1985713834, Iran
- Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Postal Box: 331-14115, Tehran, Iran
| | - Maryam Mirahmadi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Postal Box: 331-14115, Tehran, Iran
- Department of Exomine, PardisGene Company, Tehran, Postal Code: 1917635816, Iran
| | - Zohreh Azarshin
- Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Postal Box: 331-14115, Tehran, Iran
| | - Hamid Ohadi
- School of Physics and Astronomy, University of St Andrews, St Andrews, KY16 9SS, UK
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Postal Code: 1985713834, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Postal Code: 1985713834, Iran.
| |
Collapse
|
27
|
Zhang J, Zhu B. Short, but matters: short tandem repeats confer variation in transcription factor-DNA binding. Sci Bull (Beijing) 2024; 69:9-10. [PMID: 38042705 DOI: 10.1016/j.scib.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2023]
Affiliation(s)
- Jing Zhang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Key Laboratory of Epigenetic Regulation and Intervention, Chinese Academy of Sciences, Beijing 100101, China; New Cornerstone Science Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Bing Zhu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Key Laboratory of Epigenetic Regulation and Intervention, Chinese Academy of Sciences, Beijing 100101, China; New Cornerstone Science Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
28
|
Guo MH, Lee WP, Vardarajan B, Schellenberg GD, Phillips-Cremins J. Polygenic burden of short tandem repeat expansions promote risk for Alzheimer's disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.16.23298623. [PMID: 38014121 PMCID: PMC10680900 DOI: 10.1101/2023.11.16.23298623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed. As pathogenic expansions of STR cause over 30 neurologic diseases, it is important to ascertain whether STRs may also be implicated in AD risk. Here, we genotyped 321,742 polymorphic STR tracts genome-wide using PCR-free whole genome sequencing data from 2,981 individuals (1,489 AD case and 1,492 control individuals). We implemented an approach to identify STR expansions as STRs with tract lengths that are outliers from the population. We then tested for differences in aggregate burden of expansions in case versus control individuals. AD patients had a 1.19-fold increase of STR expansions compared to healthy elderly controls (p=8.27×10-3, two-sided Mann Whitney test). Individuals carrying > 30 STR expansions had 3.62-fold higher odds of having AD and had more severe AD neuropathology. AD STR expansions were highly enriched within active promoters in post-mortem hippocampal brain tissues and particularly within SINE-VNTR-Alu (SVA) retrotransposons. Together, these results demonstrate that expanded STRs within active promoter regions of the genome promote risk of AD.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Badri Vardarajan
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Jennifer Phillips-Cremins
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
29
|
DeGorter MK, Goddard PC, Karakoc E, Kundu S, Yan SM, Nachun D, Abell N, Aguirre M, Carstensen T, Chen Z, Durrant M, Dwaracherla VR, Feng K, Gloudemans MJ, Hunter N, Moorthy MPS, Pomilla C, Rodrigues KB, Smith CJ, Smith KS, Ungar RA, Balliu B, Fellay J, Flicek P, McLaren PJ, Henn B, McCoy RC, Sugden L, Kundaje A, Sandhu MS, Gurdasani D, Montgomery SB. Transcriptomics and chromatin accessibility in multiple African population samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.04.564839. [PMID: 37986808 PMCID: PMC10659267 DOI: 10.1101/2023.11.04.564839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.
Collapse
Affiliation(s)
| | - Page C Goddard
- Department of Genetics, Stanford University, Stanford, CA
| | - Emre Karakoc
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford CA
| | | | - Daniel Nachun
- Department of Pathology, Stanford University, Stanford, CA
| | - Nathan Abell
- Department of Genetics, Stanford University, Stanford, CA
| | - Matthew Aguirre
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | - Tommy Carstensen
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Ziwei Chen
- Department of Computer Science, Stanford University, Stanford CA
| | | | | | - Karen Feng
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | | | - Naiomi Hunter
- Department of Genetics, Stanford University, Stanford, CA
| | | | - Cristina Pomilla
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | | | | | - Kevin S Smith
- Department of Pathology, Stanford University, Stanford, CA
| | - Rachel A Ungar
- Department of Genetics, Stanford University, Stanford, CA
| | - Brunilda Balliu
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA and Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA
| | - Jacques Fellay
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland and Precision Medicine Unit, Biomedical Data Science Center, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Paul Flicek
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Paul J McLaren
- Sexually Transmitted and Blood-Borne Infections Division at JC Wilt Infectious Diseases Research Centre, National Microbiology Laboratory Branch, Public Health Agency of Canada, Winnipeg, Canada and Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| | - Brenna Henn
- Department of Anthropology, University of California Davis, Davis CA and Genome Center, University of California Davis, Davis CA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore
| | - Lauren Sugden
- Department of Mathematics and Computer Science, Dusquesne University, Pittsburgh, PA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA
- Department of Computer Science, Stanford University, Stanford CA
| | | | - Deepti Gurdasani
- William Harvey Research Institute, Queen Mary University of London, London, UK; Kirby Institute, University of New South Wales, Australia; School of Medicine, University of Western Australia, Australia
| | | |
Collapse
|
30
|
Bhati M, Mapel XM, Lloret-Villas A, Pausch H. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 2023; 225:iyad161. [PMID: 37655920 PMCID: PMC10627265 DOI: 10.1093/genetics/iyad161] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are significant sources of genetic variation. However, the impacts of these variants on gene regulation have not been investigated in cattle. Here, we genotyped and characterized 19,408 SVs and 374,821 STRs in 183 bovine genomes and investigated their impact on molecular phenotypes derived from testis transcriptomes. We found that 71% STRs were multiallelic. The vast majority (95%) of STRs and SVs were in intergenic and intronic regions. Only 37% SVs and 40% STRs were in high linkage disequilibrium (LD) (R2 > 0.8) with surrounding SNPs/insertions and deletions (Indels), indicating that SNP-based association testing and genomic prediction are blind to a nonnegligible portion of genetic variation. We showed that both SVs and STRs were more than 2-fold enriched among expression and splicing QTL (e/sQTL) relative to SNPs/Indels and were often associated with differential expression and splicing of multiple genes. Deletions and duplications had larger impacts on splicing and expression than any other type of SV. Exonic duplications predominantly increased gene expression either through alternative splicing or other mechanisms, whereas expression- and splicing-associated STRs primarily resided in intronic regions and exhibited bimodal effects on the molecular phenotypes investigated. Most e/sQTL resided within 100 kb of the affected genes or splicing junctions. We pinpoint candidate causal STRs and SVs associated with the expression of SLC13A4 and TTC7B and alternative splicing of a lncRNA and CAPP1. We provide a catalog of STRs and SVs for taurine cattle and show that these variants contribute substantially to gene expression and splicing variation.
Collapse
Affiliation(s)
- Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | | | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| |
Collapse
|
31
|
Zong W, Wang J, Zhao R, Niu N, Su Y, Hu Z, Liu X, Hou X, Wang L, Wang L, Zhang L. Associations of genome-wide structural variations with phenotypic differences in cross-bred Eurasian pigs. J Anim Sci Biotechnol 2023; 14:136. [PMID: 37805653 PMCID: PMC10559557 DOI: 10.1186/s40104-023-00929-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/03/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND During approximately 10,000 years of domestication and selection, a large number of structural variations (SVs) have emerged in the genome of pig breeds, profoundly influencing their phenotypes and the ability to adapt to the local environment. SVs (≥ 50 bp) are widely distributed in the genome, mainly in the form of insertion (INS), mobile element insertion (MEI), deletion (DEL), duplication (DUP), inversion (INV), and translocation (TRA). While studies have investigated the SVs in pig genomes, genome-wide association studies (GWAS)-based on SVs have been rarely conducted. RESULTS Here, we obtained a high-quality SV map containing 123,151 SVs from 15 Large White and 15 Min pigs through integrating the power of several SV tools, with 53.95% of the SVs being reported for the first time. These high-quality SVs were used to recover the population genetic structure, confirming the accuracy of genotyping. Potential functional SV loci were then identified based on positional effects and breed stratification. Finally, GWAS were performed for 36 traits by genotyping the screened potential causal loci in the F2 population according to their corresponding genomic positions. We identified a large number of loci involved in 8 carcass traits and 6 skeletal traits on chromosome 7, with FKBP5 containing the most significant SV locus for almost all traits. In addition, we found several significant loci in intramuscular fat, abdominal circumference, heart weight, and liver weight, etc. CONCLUSIONS: We constructed a high-quality SV map using high-coverage sequencing data and then analyzed them by performing GWAS for 25 carcass traits, 7 skeletal traits, and 4 meat quality traits to determine that SVs may affect body size between European and Chinese pig breeds.
Collapse
Affiliation(s)
- Wencheng Zong
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jinbu Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Runze Zhao
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- College of Animal Science, Shanxi Agricultural University, Jinzhong, 030801, China
| | - Naiqi Niu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yanfang Su
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ziping Hu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, 266109, China
| | - Xin Liu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xinhua Hou
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ligang Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Lixian Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Longchao Zhang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
32
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
33
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Young Greenwald WW, D'Antonio M, Pera MF, Frazer KA. Analysis of regulatory network modules in hundreds of human stem cell lines reveals complex epigenetic and genetic factors contribute to pluripotency state differences between subpopulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.20.541447. [PMID: 37292794 PMCID: PMC10245835 DOI: 10.1101/2023.05.20.541447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discovered 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which were highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlied the coordinated expression of genes in the GNMs. Epigenetic analyses revealed that regulatory networks underlying self-renewal and pluripotency have a surprising level of complexity. Genetic analyses identified thousands of regulatory variants that overlapped predicted transcription factor binding sites and were associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network were significantly enriched for regulatory variants with large effects, suggesting that they may play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work captures the coordinated activity of tens of thousands of regulatory elements in hiPSCs and bins these elements into discrete functionally characterized regulatory networks, shows that regulatory elements in pluripotency networks harbor variants with large effects, and provides a rich resource for future pluripotent stem cell research.
Collapse
|
34
|
Lutz MW, Chiba-Falek O. Bioinformatics pipeline to guide post-GWAS studies in Alzheimer's: A new catalogue of disease candidate short structural variants. Alzheimers Dement 2023; 19:4094-4109. [PMID: 37253165 PMCID: PMC10524333 DOI: 10.1002/alz.13168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 04/27/2023] [Accepted: 05/08/2023] [Indexed: 06/01/2023]
Abstract
BACKGROUND Short structural variants (SSVs), including insertions/deletions (indels), are common in the human genome and impact disease risk. The role of SSVs in late-onset Alzheimer's disease (LOAD) has been understudied. In this study, we developed a bioinformatics pipeline of SSVs within LOAD-genome-wide association study (GWAS) regions to prioritize regulatory SSVs based on the strength of their predicted effect on transcription factor (TF) binding sites. METHODS The pipeline utilized publicly available functional genomics data sources including candidate cis-regulatory elements (cCREs) from ENCODE and single-nucleus (sn)RNA-seq data from LOAD patient samples. RESULTS We catalogued 1581 SSVs in candidate cCREs in LOAD GWAS regions that disrupted 737 TF sites. That included SSVs that disrupted the binding of RUNX3, SPI1, and SMAD3, within the APOE-TOMM40, SPI1, and MS4A6A LOAD regions. CONCLUSIONS The pipeline developed here prioritized non-coding SSVs in cCREs and characterized their putative effects on TF binding. The approach integrates multiomics datasets for validation experiments using disease models.
Collapse
Affiliation(s)
- Michael W. Lutz
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
| | - Ornit Chiba-Falek
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC 27710, USA
| |
Collapse
|
35
|
Billingsley KJ, Ding J, Jerez PA, Illarionova A, Levine K, Grenn FP, Makarious MB, Moore A, Vitale D, Reed X, Hernandez D, Torkamani A, Ryten M, Hardy J, UK Brain Expression Consortium (UKBEC), Chia R, Scholz SW, Traynor BJ, Dalgard CL, Ehrlich DJ, Tanaka T, Ferrucci L, Beach T, Serrano GE, Quinn JP, Bubb VJ, Collins RL, Zhao X, Walker M, Pierce-Hoffman E, Brand H, Talkowski ME, Casey B, Cookson MR, Markham A, Nalls MA, Mahmoud M, Sedlazeck FJ, Blauwendraat C, Gibbs JR, Singleton AB. Genome-Wide Analysis of Structural Variants in Parkinson Disease. Ann Neurol 2023; 93:1012-1022. [PMID: 36695634 PMCID: PMC10192042 DOI: 10.1002/ana.26608] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/03/2023] [Accepted: 01/16/2023] [Indexed: 01/26/2023]
Abstract
OBJECTIVE Identification of genetic risk factors for Parkinson disease (PD) has to date been primarily limited to the study of single nucleotide variants, which only represent a small fraction of the genetic variation in the human genome. Consequently, causal variants for most PD risk are not known. Here we focused on structural variants (SVs), which represent a major source of genetic variation in the human genome. We aimed to discover SVs associated with PD risk by performing the first large-scale characterization of SVs in PD. METHODS We leveraged a recently developed computational pipeline to detect and genotype SVs from 7,772 Illumina short-read whole genome sequencing samples. Using this set of SV variants, we performed a genome-wide association study using 2,585 cases and 2,779 controls and identified SVs associated with PD risk. Furthermore, to validate the presence of these variants, we generated a subset of matched whole-genome long-read sequencing data. RESULTS We genotyped and tested 3,154 common SVs, representing over 412 million nucleotides of previously uncatalogued genetic variation. Using long-read sequencing data, we validated the presence of three novel deletion SVs that are associated with risk of PD from our initial association analysis, including a 2 kb intronic deletion within the gene LRRN4. INTERPRETATION We identified three SVs associated with genetic risk of PD. This study represents the most comprehensive assessment of the contribution of SVs to the genetic risk of PD to date. ANN NEUROL 2023;93:1012-1022.
Collapse
Affiliation(s)
- Kimberley J. Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | | | | | - Francis P. Grenn
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Mary B. Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Anni Moore
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Daniel Vitale
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
- Data Tecnica International, Washington, DC, USA
| | - Xylena Reed
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Dena Hernandez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Ali Torkamani
- The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Mina Ryten
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - John Hardy
- UK Dementia Research Institute and Department of Neurodegenerative Disease and Reta Lila Weston Institute, UCL Queen Square Institute of Neurology and UCL Movement Disorders Centre, University College London, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | | | - Ruth Chia
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, Maryland, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, Maryland, USA
| | - Bryan J. Traynor
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, Maryland, USA
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Therapeutic Development Branch, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, USA
- National Institute of Neurological Disorders and Stroke, Bethesda, MD 20892
- Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London WC1N 1PJ, UK
| | - Clifton L. Dalgard
- Department of Anatomy Physiology & Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
- The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Debra J. Ehrlich
- Parkinson’s Disease Clinic, Office of the Clinical Director, National Institute of Neurological Disorders and Stroke, Bethesda, Maryland, USA
| | - Toshiko Tanaka
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Luigi Ferrucci
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Thomas.G. Beach
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ
| | - Geidy E. Serrano
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Vivien J. Bubb
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
| | - Mark Walker
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Data Sciences Platform, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
| | - Emma Pierce-Hoffman
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Data Sciences Platform, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115
| | - Michael E. Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson’s Research, New York, NY 10001
| | - Mark R Cookson
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | | | - Mike A. Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
- Data Tecnica International, Washington, DC, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, US
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - J. Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Andrew B. Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| |
Collapse
|
36
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
37
|
Zhang G, Andersen EC. Interplay Between Polymorphic Short Tandem Repeats and Gene Expression Variation in Caenorhabditis elegans. Mol Biol Evol 2023; 40:msad067. [PMID: 36999565 PMCID: PMC10075192 DOI: 10.1093/molbev/msad067] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/20/2023] [Accepted: 03/29/2023] [Indexed: 04/01/2023] Open
Abstract
Short tandem repeats (STRs) have orders of magnitude higher mutation rates than single nucleotide variants (SNVs) and have been proposed to accelerate evolution in many organisms. However, only few studies have addressed the impact of STR variation on phenotypic variation at both the organismal and molecular levels. Potential driving forces underlying the high mutation rates of STRs also remain largely unknown. Here, we leverage the recently generated expression and STR variation data among wild Caenorhabditis elegans strains to conduct a genome-wide analysis of how STRs affect gene expression variation. We identify thousands of expression STRs (eSTRs) showing regulatory effects and demonstrate that they explain missing heritability beyond SNV-based expression quantitative trait loci. We illustrate specific regulatory mechanisms such as how eSTRs affect splicing sites and alternative splicing efficiency. We also show that differential expression of antioxidant genes and oxidative stresses might affect STR mutations systematically using both wild strains and mutation accumulation lines. Overall, we reveal the interplay between STRs and gene expression variation by providing novel insights into regulatory mechanisms of STRs and highlighting that oxidative stress could lead to higher STR mutation rates.
Collapse
Affiliation(s)
- Gaotian Zhang
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| | - Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| |
Collapse
|
38
|
Hong J, Su S, Wang L, Bai S, Xu J, Li Z, Betts N, Liang W, Wang W, Shi J, Zhang D. Combined genome-wide association study and epistasis analysis reveal multifaceted genetic architectures of plant height in Asian cultivated rice. PLANT, CELL & ENVIRONMENT 2023; 46:1295-1311. [PMID: 36734269 DOI: 10.1111/pce.14557] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 01/08/2023] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
Plant height (PH) in rice (Oryza sativa) is an important trait for its adaptation and agricultural performance. Discovery of the semi-dwarf1 (SD1) mutation initiated the Green Revolution, boosting rice yield and fitness, but the underlying genetic regulation of PH in rice remains largely unknown. Here, we performed genome-wide association study (GWAS) and identified 12 non-repetitive QTL/genes regulating PH variation in 619 Asian cultivated rice accessions. One of these was an SD1 structural variant, not normally detected in standard GWAS analyses. Given the strong effect of SD1 on PH, we also divided 619 accessions into subgroups harbouring distinct SD1 haplotypes, and found a further 85 QTL/genes for PH, revealing genetic heterogeneity that may be missed by analysing a broad, diverse population. Moreover, we uncovered two epistatic interaction networks of PH-associated QTL/genes in the japonica (Geng)-dominant SD1NIP subgroup. In one of them, the hub QTL/gene qphSN1.4/GAMYB interacted with qphSN3.1/OsINO80, qphSN3.4/HD16/EL1, qphSN6.2/LOC_Os06g11130, and qphSN10.2/MADS56. Sequence variations in GAMYB and MADS56 were associated with their expression levels and PH variations, and MADS56 was shown to physically interact with MADS57 to coregulate expression of gibberellin (GA) metabolic genes OsGA2ox3 and Elongated Uppermost Internode1 (EUI1). Our study uncovered the multifaceted genetic architectures of rice PH, and provided novel and abundant genetic resources for breeding semi-dwarf rice and new candidates for further mechanistic studies on regulation of PH in rice.
Collapse
Affiliation(s)
- Jun Hong
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
| | - Su Su
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
| | - Li Wang
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
| | - Shaoxing Bai
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
| | - Jianlong Xu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhikang Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Natalie Betts
- School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia, Australia
| | - Wanqi Liang
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
| | - Wensheng Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jianxin Shi
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
| | - Dabing Zhang
- Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Yazhou Bay Institute of Deepsea Sci-Tech, Shanghai Jiao Tong University, Shanghai, China
- School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia, Australia
| |
Collapse
|
39
|
Wright SE, Todd PK. Native functions of short tandem repeats. eLife 2023; 12:e84043. [PMID: 36940239 PMCID: PMC10027321 DOI: 10.7554/elife.84043] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 03/08/2023] [Indexed: 03/21/2023] Open
Abstract
Over a third of the human genome is comprised of repetitive sequences, including more than a million short tandem repeats (STRs). While studies of the pathologic consequences of repeat expansions that cause syndromic human diseases are extensive, the potential native functions of STRs are often ignored. Here, we summarize a growing body of research into the normal biological functions for repetitive elements across the genome, with a particular focus on the roles of STRs in regulating gene expression. We propose reconceptualizing the pathogenic consequences of repeat expansions as aberrancies in normal gene regulation. From this altered viewpoint, we predict that future work will reveal broader roles for STRs in neuronal function and as risk alleles for more common human neurological diseases.
Collapse
Affiliation(s)
- Shannon E Wright
- Department of Neurology, University of Michigan–Ann ArborAnn ArborUnited States
- Neuroscience Graduate Program, University of Michigan–Ann ArborAnn ArborUnited States
- Department of Neuroscience, Picower InstituteCambridgeUnited States
| | - Peter K Todd
- Department of Neurology, University of Michigan–Ann ArborAnn ArborUnited States
- VA Ann Arbor Healthcare SystemAnn ArborUnited States
| |
Collapse
|
40
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
41
|
Jun G, English AC, Metcalf GA, Yang J, Chaisson MJP, Pankratz N, Menon VK, Salerno WJ, Krasheninina O, Smith AV, Lane JA, Blackwell T, Kang HM, Salvi S, Meng Q, Shen H, Pasham D, Bhamidipati S, Kottapalli K, Arnett DK, Ashley-Koch A, Auer PL, Beutel KM, Bis JC, Blangero J, Bowden DW, Brody JA, Cade BE, Chen YDI, Cho MH, Curran JE, Fornage M, Freedman BI, Fingerlin T, Gelb BD, Hou L, Hung YJ, Kane JP, Kaplan R, Kim W, Loos RJ, Marcus GM, Mathias RA, McGarvey ST, Montgomery C, Naseri T, Nouraie SM, Preuss MH, Palmer ND, Peyser PA, Raffield LM, Ratan A, Redline S, Reupena S, Rotter JI, Rich SS, Rienstra M, Ruczinski I, Sankaran VG, Schwartz DA, Seidman CE, Seidman JG, Silverman EK, Smith JA, Stilp A, Taylor KD, Telen MJ, Weiss ST, Williams LK, Wu B, Yanek LR, Zhang Y, Lasky-Su J, Gingras MC, Dutcher SK, Eichler EE, Gabriel S, Germer S, Kim R, Viaud-Martinez KA, Nickerson DA, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Luo J, Reiner A, Gibbs RA, Boerwinkle E, Abecasis G, Sedlazeck FJ. Structural variation across 138,134 samples in the TOPMed consortium. RESEARCH SQUARE 2023:rs.3.rs-2515453. [PMID: 36778386 PMCID: PMC9915771 DOI: 10.21203/rs.3.rs-2515453/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hematologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Collapse
Affiliation(s)
- Goo Jun
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
| | - Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Jianzhi Yang
- University of Southern California, Los Angeles, CA, USA
| | | | | | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | | | | | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - John A Lane
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Tom Blackwell
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Hyun Min Kang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Sejal Salvi
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Qingchang Meng
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Hua Shen
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Divya Pasham
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Sravya Bhamidipati
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kavya Kottapalli
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Donna K. Arnett
- Department of Epidemiology, University of Kentucky College of Public Health
| | - Allison Ashley-Koch
- Department of Medicine, Duke University Medical Center, Durham, NC
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Paul L. Auer
- Division of Biostatistics and Cancer Center, Medical College of Wisconsin, Milwaukee WI
| | | | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas, Rio Grande Valley School of Medicine, Brownsville, TX
| | - Donald W. Bowden
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA
| | - Yii-Der Ida Chen
- Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Joanne E. Curran
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX
| | - Barry I. Freedman
- Department of Internal Medicine, Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Tasha Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, 1400 Jackson St., Denver, CO, 80206, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai
| | | | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, Taiwan
| | - John P Kane
- Cardiovascular Research Institute, University of California, San Francisco
| | - Robert Kaplan
- Department of epidemiology and population health, Albert Einstein College of Medicine, Bronx NY USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Gregory M Marcus
- Division of Cardiology, University of California, San Francisco CA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Stephen T. McGarvey
- Department of Epidemiology, International Health Institute and Department of Anthropology, Brown University
| | - Courtney Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - S. Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Stephen S. Rich
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vijay G. Sankaran
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Christine E. Seidman
- Department of Genetics, Harvard Medical School
- Cardiovascular Division, Brigham & Women’s Hospital, Harvard University
- Howard Hughes Medical Institute, Harvard University
| | | | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Jennifer A. Smith
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Marilyn J. Telen
- Department of Medicine, Duke University Medical Center, Durham, NC
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - L. Keoki Williams
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Baojun Wu
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Lisa R. Yanek
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | | | - Susan K. Dutcher
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA
| | | | | | - Ryan Kim
- Psomagen, Inc.,Rockville, Maryland, USA
| | | | | | | | - James Luo
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98109, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Goncalo Abecasis
- Regeneron Genetics Center
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| |
Collapse
|
42
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
43
|
Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, MacLeod IM. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol 2023; 55:9. [PMID: 36721111 PMCID: PMC9887926 DOI: 10.1186/s12711-023-00783-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/23/2023] [Indexed: 02/02/2023] Open
Abstract
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Collapse
Affiliation(s)
- Tuan V. Nguyen
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | | | - Jianghui Wang
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Hans D. Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Ruidong Xiang
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Michael E. Goddard
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Loan T. Nguyen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Elizabeth M. Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Ben J. Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Amanda J. Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Iona M. MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| |
Collapse
|
44
|
Jun G, English AC, Metcalf GA, Yang J, Chaisson MJP, Pankratz N, Menon VK, Salerno WJ, Krasheninina O, Smith AV, Lane JA, Blackwell T, Kang HM, Salvi S, Meng Q, Shen H, Pasham D, Bhamidipati S, Kottapalli K, Arnett DK, Ashley-Koch A, Auer PL, Beutel KM, Bis JC, Blangero J, Bowden DW, Brody JA, Cade BE, Chen YDI, Cho MH, Curran JE, Fornage M, Freedman BI, Fingerlin T, Gelb BD, Hou L, Hung YJ, Kane JP, Kaplan R, Kim W, Loos RJ, Marcus GM, Mathias RA, McGarvey ST, Montgomery C, Naseri T, Nouraie SM, Preuss MH, Palmer ND, Peyser PA, Raffield LM, Ratan A, Redline S, Reupena S, Rotter JI, Rich SS, Rienstra M, Ruczinski I, Sankaran VG, Schwartz DA, Seidman CE, Seidman JG, Silverman EK, Smith JA, Stilp A, Taylor KD, Telen MJ, Weiss ST, Williams LK, Wu B, Yanek LR, Zhang Y, Lasky-Su J, Gingras MC, Dutcher SK, Eichler EE, Gabriel S, Germer S, Kim R, Viaud-Martinez KA, Nickerson DA, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Luo J, Reiner A, Gibbs RA, Boerwinkle E, Abecasis G, Sedlazeck FJ. Structural variation across 138,134 samples in the TOPMed consortium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.25.525428. [PMID: 36747810 PMCID: PMC9900832 DOI: 10.1101/2023.01.25.525428] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Collapse
Affiliation(s)
- Goo Jun
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
| | - Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Jianzhi Yang
- University of Southern California, Los Angeles, CA, USA
| | | | | | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | | | | | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - John A Lane
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Tom Blackwell
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Hyun Min Kang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Sejal Salvi
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Qingchang Meng
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Hua Shen
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Divya Pasham
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Sravya Bhamidipati
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kavya Kottapalli
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Donna K. Arnett
- Department of Epidemiology, University of Kentucky College of Public Health
| | - Allison Ashley-Koch
- Department of Medicine, Duke University Medical Center, Durham, NC
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Paul L. Auer
- Division of Biostatistics and Cancer Center, Medical College of Wisconsin, Milwaukee WI
| | | | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas, Rio Grande Valley School of Medicine, Brownsville, TX
| | - Donald W. Bowden
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA
| | - Yii-Der Ida Chen
- Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Joanne E. Curran
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX
| | - Barry I. Freedman
- Department of Internal Medicine, Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Tasha Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, 1400 Jackson St., Denver, CO, 80206, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai
| | | | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, Taiwan
| | - John P Kane
- Cardiovascular Research Institute, University of California, San Francisco
| | - Robert Kaplan
- Department of epidemiology and population health, Albert Einstein College of Medicine, Bronx NY USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Gregory M Marcus
- Division of Cardiology, University of California, San Francisco CA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Stephen T. McGarvey
- Department of Epidemiology, International Health Institute and Department of Anthropology, Brown University
| | - Courtney Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - S. Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Stephen S. Rich
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vijay G. Sankaran
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Christine E. Seidman
- Department of Genetics, Harvard Medical School
- Cardiovascular Division, Brigham & Women’s Hospital, Harvard University
- Howard Hughes Medical Institute, Harvard University
| | | | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA
| | - Jennifer A. Smith
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Marilyn J. Telen
- Department of Medicine, Duke University Medical Center, Durham, NC
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - L. Keoki Williams
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Baojun Wu
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Lisa R. Yanek
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Susan K. Dutcher
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA
| | | | | | - Ryan Kim
- Psomagen, Inc.,Rockville, Maryland, USA
| | | | | | | | - James Luo
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98109, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Goncalo Abecasis
- Regeneron Genetics Center
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| |
Collapse
|
45
|
Wheeler MM, Stilp AM, Rao S, Halldórsson BV, Beyter D, Wen J, Mihkaylova AV, McHugh CP, Lane J, Jiang MZ, Raffield LM, Jun G, Sedlazeck FJ, Metcalf G, Yao Y, Bis JB, Chami N, de Vries PS, Desai P, Floyd JS, Gao Y, Kammers K, Kim W, Moon JY, Ratan A, Yanek LR, Almasy L, Becker LC, Blangero J, Cho MH, Curran JE, Fornage M, Kaplan RC, Lewis JP, Loos RJF, Mitchell BD, Morrison AC, Preuss M, Psaty BM, Rich SS, Rotter JI, Tang H, Tracy RP, Boerwinkle E, Abecasis GR, Blackwell TW, Smith AV, Johnson AD, Mathias RA, Nickerson DA, Conomos MP, Li Y, Þorsteinsdóttir U, Magnússon MK, Stefansson K, Pankratz ND, Bauer DE, Auer PL, Reiner AP. Whole genome sequencing identifies structural variants contributing to hematologic traits in the NHLBI TOPMed program. Nat Commun 2022; 13:7592. [PMID: 36481753 PMCID: PMC9732337 DOI: 10.1038/s41467-022-35354-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 11/29/2022] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies have identified thousands of single nucleotide variants and small indels that contribute to variation in hematologic traits. While structural variants are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of structural variants to quantitative blood cell trait variation is unknown. Here we utilized whole genome sequencing data in ancestrally diverse participants of the NHLBI Trans Omics for Precision Medicine program (N = 50,675) to detect structural variants associated with hematologic traits. Using single variant tests, we assessed the association of common and rare structural variants with red cell-, white cell-, and platelet-related quantitative traits and observed 21 independent signals (12 common and 9 rare) reaching genome-wide significance. The majority of these associations (N = 18) replicated in independent datasets. In genome-editing experiments, we provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.
Collapse
Affiliation(s)
- Marsha M. Wheeler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98105 USA
| | - Adrienne M. Stilp
- grid.34477.330000000122986657Department of Biostatistics, University of Washington, Seattle, WA 98105 USA
| | - Shuquan Rao
- grid.2515.30000 0004 0378 8438Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115 USA ,grid.65499.370000 0001 2106 9910Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115 USA ,grid.511171.2Harvard Stem Cell Institute, Boston, MA 02138 USA ,grid.66859.340000 0004 0546 1623Broad Institute, Cambridge, MA 02142 USA ,grid.38142.3c000000041936754XDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115 USA ,grid.506261.60000 0001 0706 7839State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020 China
| | - Bjarni V. Halldórsson
- grid.421812.c0000 0004 0618 6889deCODE genetics/Amgen Inc., Reykjavik, Iceland ,grid.9580.40000 0004 0643 5232School of Technology, Reykjavik University, Reykjavík, Iceland
| | - Doruk Beyter
- grid.421812.c0000 0004 0618 6889deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | - Jia Wen
- grid.10698.360000000122483208Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Anna V. Mihkaylova
- grid.34477.330000000122986657Department of Biostatistics, University of Washington, Seattle, WA 98105 USA
| | - Caitlin P. McHugh
- grid.34477.330000000122986657Department of Biostatistics, University of Washington, Seattle, WA 98105 USA
| | - John Lane
- grid.17635.360000000419368657Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN 55455 USA
| | - Min-Zhi Jiang
- grid.10698.360000000122483208Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Laura M. Raffield
- grid.410711.20000 0001 1034 1720Department of Genetics, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Goo Jun
- grid.267308.80000 0000 9206 2401Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Fritz J. Sedlazeck
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Baylor College of Medicine, Houston, TX USA
| | - Ginger Metcalf
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Baylor College of Medicine, Houston, TX USA
| | - Yao Yao
- grid.2515.30000 0004 0378 8438Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115 USA ,grid.65499.370000 0001 2106 9910Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115 USA ,grid.511171.2Harvard Stem Cell Institute, Boston, MA 02138 USA ,grid.66859.340000 0004 0546 1623Broad Institute, Cambridge, MA 02142 USA ,grid.38142.3c000000041936754XDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115 USA
| | - Joshua B. Bis
- grid.34477.330000000122986657Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101 USA
| | - Nathalie Chami
- grid.59734.3c0000 0001 0670 2351The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Paul S. de Vries
- grid.267308.80000 0000 9206 2401Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA ,grid.267308.80000 0000 9206 2401Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Pinkal Desai
- grid.5386.8000000041936877XDivision of Hematology and Oncology, Weill Cornell Medical College, New York, NY 10065 USA
| | - James S. Floyd
- grid.34477.330000000122986657Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101 USA
| | - Yan Gao
- grid.251313.70000 0001 2169 2489Jackson Heart Study, Department of Medicine, University of Mississippi, Jackson, MS 39216 USA
| | - Kai Kammers
- grid.21107.350000 0001 2171 9311GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287 USA
| | - Wonji Kim
- grid.62560.370000 0004 0378 8294Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 2115 USA
| | - Jee-Young Moon
- grid.251993.50000000121791997Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Aakrosh Ratan
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA 22908 USA
| | - Lisa R. Yanek
- grid.21107.350000 0001 2171 9311GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287 USA
| | - Laura Almasy
- grid.25879.310000 0004 1936 8972Children’s Hospital of Philadelphia and University of Pennsylvania School of Medicine, Philadelphia, PA 19104 USA
| | - Lewis C. Becker
- grid.21107.350000 0001 2171 9311GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287 USA
| | - John Blangero
- grid.449717.80000 0004 5374 269XDepartment of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520 USA
| | - Michael H. Cho
- grid.62560.370000 0004 0378 8294Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 2115 USA
| | - Joanne E. Curran
- grid.449717.80000 0004 5374 269XDepartment of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520 USA
| | - Myriam Fornage
- grid.267308.80000 0000 9206 2401Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Robert C. Kaplan
- grid.251993.50000000121791997Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Joshua P. Lewis
- grid.411024.20000 0001 2175 4264Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD USA
| | - Ruth J. F. Loos
- grid.59734.3c0000 0001 0670 2351The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA ,grid.59734.3c0000 0001 0670 2351Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY USA ,grid.59734.3c0000 0001 0670 2351The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY USA ,grid.5254.60000 0001 0674 042XNovo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Braxton D. Mitchell
- grid.411024.20000 0001 2175 4264Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD USA
| | - Alanna C. Morrison
- grid.267308.80000 0000 9206 2401Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Michael Preuss
- grid.59734.3c0000 0001 0670 2351The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Bruce M. Psaty
- grid.34477.330000000122986657Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101 USA
| | - Stephen S. Rich
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA 22908 USA
| | - Jerome I. Rotter
- grid.513199.6The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Hua Tang
- grid.168010.e0000000419368956Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Russell P. Tracy
- grid.59062.380000 0004 1936 7689Departments of Pathology & Laboratory Medicine and Biochemistry, Larner College of Medicine at the University of Vermont, Colchester, VT 5446 USA
| | - Eric Boerwinkle
- grid.267308.80000 0000 9206 2401Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Goncalo R. Abecasis
- grid.214458.e0000000086837370TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI 48109 USA
| | - Thomas W. Blackwell
- grid.214458.e0000000086837370TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI 48109 USA
| | - Albert V. Smith
- grid.214458.e0000000086837370TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI 48109 USA
| | - Andrew D. Johnson
- grid.279885.90000 0001 2293 4638Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Framingham, MA 1702 USA
| | - Rasika A. Mathias
- grid.21107.350000 0001 2171 9311GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287 USA
| | - Deborah A. Nickerson
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98105 USA
| | - Matthew P. Conomos
- grid.34477.330000000122986657Department of Biostatistics, University of Washington, Seattle, WA 98105 USA
| | - Yun Li
- grid.10698.360000000122483208Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Unnur Þorsteinsdóttir
- grid.421812.c0000 0004 0618 6889deCODE genetics/Amgen Inc., Reykjavik, Iceland ,grid.14013.370000 0004 0640 0021Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Magnús K. Magnússon
- grid.421812.c0000 0004 0618 6889deCODE genetics/Amgen Inc., Reykjavik, Iceland ,grid.14013.370000 0004 0640 0021Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Kari Stefansson
- grid.421812.c0000 0004 0618 6889deCODE genetics/Amgen Inc., Reykjavik, Iceland ,grid.14013.370000 0004 0640 0021Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Nathan D. Pankratz
- grid.17635.360000000419368657Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN 55455 USA
| | - Daniel E. Bauer
- grid.2515.30000 0004 0378 8438Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115 USA ,grid.65499.370000 0001 2106 9910Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115 USA ,grid.511171.2Harvard Stem Cell Institute, Boston, MA 02138 USA ,grid.66859.340000 0004 0546 1623Broad Institute, Cambridge, MA 02142 USA ,grid.38142.3c000000041936754XDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115 USA
| | - Paul L. Auer
- grid.30760.320000 0001 2111 8460Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226 USA
| | - Alex P. Reiner
- grid.34477.330000000122986657Department of Epidemiology, University of Washington, Seattle, WA 98105 USA
| |
Collapse
|
46
|
Wang T, Liu J, Chen J, Qin B. Generation and Differentiation of Induced Pluripotent Stem Cells from Mononuclear Cells in An Age-Related Macular Degeneration Patient. CELL JOURNAL 2022; 24:764-773. [PMID: 36527349 PMCID: PMC9790072 DOI: 10.22074/cellj.2022.557559.1072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Indexed: 01/05/2023]
Abstract
OBJECTIVE We aimed to generate induced pluripotent stem cells (iPSCs)-derived retinal pigmented epithelium (RPE) cells from peripheral blood mononuclear cells (PBMCs) and age-related macular degeneration (AMD) patient to provide potential cell sources for both basic scientific research and clinical application. MATERIALS AND METHODS In this experimental study, PBMCs were isolated from the whole blood of a 70-year-old female patient with AMD and reprogrammed into iPSCs by transfection of Sendai virus that contained Yamanaka factors (OCT4, SOX2, KLF4, and c-MYC). Flow cytometry, real-time quantitative polymerase chain reaction (qPCR), karyotype analysis, embryoid body (EB) formation, and teratoma detection were performed to confirm that AMD-iPSCs exhibited full pluripotency and maintained a normal karyotype after reprogramming. AMD-iPSCs were induced into RPE cells by stepwise induced differentiation and specific markers of RPE cells examined by immunofluorescence and flow cytometry. RESULTS The iPSC colonies started to form on three weeks post-infection. AMD-iPSCs exhibited typical morphology including roundness, a large nucleus, sparse cytoplasm, and conspicuous nucleoli. QPCR data showed that AMDiPSCs expressed pluripotency markers (endo-OCT4, endo-SOX2, NANOG and REX1). Flow cytometry indicated 99.7% of generated iPSCs was TRA-1-60 positive. Methylation sequencing showed that the regions of OCT4 and NANOG promoter were demethylated in iPSCs. EBs and teratomas formation assay showed that iPSCs had strong differentiation potential and pluripotency. After a series of inductions with differentiation mediums, a monolayer of AMDiPSC- RPE cells was observed on day 50. The AMD-iPSC-RPEs highly expressed specific RPE markers (MITF, ZO-1, Bestrophin, and PMEL17). CONCLUSION A high quality iPSCs could be established from the PBMCs obtained from elderly AMD patient. The AMDiPSC displayed complete pluripotency, enabling for scientific study, disease modeling, pharmacological testing, and therapeutic applications in personalized medicine. Collectively, we successfully differentiated the iPSCs into RPE with native RPE characteristics, which might provide potential regenerative treatments for AMD patients.
Collapse
Affiliation(s)
- Tongmiao Wang
- Shenzhen Aier Eye Hospital, Shenzhen, China,Aier Eye Hospital, Jinan University, Shenzhen, China,Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Jingwen Liu
- Shenzhen Aier Eye Hospital, Shenzhen, China,Aier Eye Hospital, Jinan University, Shenzhen, China,Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Jianhua Chen
- Shenzhen Aier Eye Hospital, Shenzhen, China,Aier Eye Hospital, Jinan University, Shenzhen, China,Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China,Aier Eye Hospital Group, Changsha, China,*Corresponding Address:Shenzhen Aier Eye HospitalShenzhenChina
Emails:,
| | - Bo Qin
- Shenzhen Aier Eye Hospital, Shenzhen, China,Aier Eye Hospital, Jinan University, Shenzhen, China,Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China,Aier Eye Hospital Group, Changsha, China,*Corresponding Address:Shenzhen Aier Eye HospitalShenzhenChina
Emails:,
| |
Collapse
|
47
|
Comparative analysis of microsatellites in coding regions provides insights into the adaptability of the giant panda, polar bear and brown bear. Genetica 2022; 150:355-366. [DOI: 10.1007/s10709-022-00173-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 09/13/2022] [Indexed: 11/27/2022]
|
48
|
Wang Y, Ling Y, Gong J, Zhao X, Zhou H, Xie B, Lou H, Zhuang X, Jin L, The Han100K Initiative, Fan S, Zhang G, Xu S. PGG.SV: a whole-genome-sequencing-based structural variant resource and data analysis platform. Nucleic Acids Res 2022; 51:D1109-D1116. [PMID: 36243989 PMCID: PMC9825616 DOI: 10.1093/nar/gkac905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/21/2022] [Accepted: 10/04/2022] [Indexed: 01/30/2023] Open
Abstract
Structural variations (SVs) play important roles in human evolution and diseases, but there is a lack of data resources concerning representative samples, especially for East Asians. Taking advantage of both next-generation sequencing and third-generation sequencing data at the whole-genome level, we developed the database PGG.SV to provide a practical platform for both regionally and globally representative structural variants. In its current version, PGG.SV archives 584 277 SVs obtained from whole-genome sequencing data of 6048 samples, including 1030 long-read sequencing genomes representing 177 global populations. PGG.SV provides (i) high-quality SVs with fine-scale and precise genomic locations in both GRCh37 and GRCh38, covering underrepresented SVs in existing sequencing and microarray data; (ii) hierarchical estimation of SV prevalence in geographical populations; (iii) informative annotations of SV-related genes, potential functions and clinical effects; (iv) an analysis platform to facilitate SV-based case-control association studies and (v) various visualization tools for understanding the SV structures in the human genome. Taken together, PGG.SV provides a user-friendly online interface, easy-to-use analysis tools and a detailed presentation of results. PGG.SV is freely accessible via https://www.biosino.org/pggsv.
Collapse
Affiliation(s)
| | | | | | - Xiaohan Zhao
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China
| | - Hanwen Zhou
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bo Xie
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Haiyi Lou
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xinhao Zhuang
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China
| | | | - Shaohua Fan
- Correspondence may also be addressed to Shaohua Fan.
| | - Guoqing Zhang
- Correspondence may also be addressed to Guoqing Zhang.
| | - Shuhua Xu
- To whom correspondence should be addressed. Tel: +86 21 31246617; Fax: +86 21 31246617;
| |
Collapse
|
49
|
Lye Z, Choi JY, Purugganan MD. Deleterious mutations and the rare allele burden on rice gene expression. Mol Biol Evol 2022; 39:6693943. [PMID: 36073358 PMCID: PMC9512150 DOI: 10.1093/molbev/msac193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Deleterious genetic variation is maintained in populations at low frequencies. Under a model of stabilizing selection, rare (and presumably deleterious) genetic variants are associated with increase or decrease in gene expression from some intermediate optimum. We investigate this phenomenon in a population of largely Oryza sativa ssp. indica rice landraces under normal unstressed wet and stressful drought field conditions. We include single nucleotide polymorphisms, insertion/deletion mutations, and structural variants in our analysis and find a stronger association between rare variants and gene expression outliers under the stress condition. We also show an association of the strength of this rare variant effect with linkage, gene expression levels, network connectivity, local recombination rate, and fitness consequence scores, consistent with the stabilizing selection model of gene expression.
Collapse
Affiliation(s)
- Zoe Lye
- Center for Genomics and Systems Biology, New York University, New York, NY 10003
| | - Jae Young Choi
- Center for Genomics and Systems Biology, New York University, New York, NY 10003
| | - Michael D Purugganan
- Center for Genomics and Systems Biology, New York University, New York, NY 10003.,Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| |
Collapse
|
50
|
Berthold N, Pytte J, Bulik CM, Tschochner M, Medland SE, Akkari PA. Bridging the gap: Short structural variants in the genetics of anorexia nervosa. Int J Eat Disord 2022; 55:747-753. [PMID: 35470453 PMCID: PMC9545787 DOI: 10.1002/eat.23716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/30/2022] [Accepted: 03/31/2022] [Indexed: 11/07/2022]
Abstract
Anorexia nervosa (AN) is a devastating disorder with evidence of underexplored heritability. Twin and family studies estimate heritability (h2 ) to be 57%-64%, and genome-wide association studies (GWAS) reveal significant genetic correlations with psychiatric and anthropometric traits and a total of nine genome-wide significant loci. Whether significantly associated single nucleotide polymorphisms identified by GWAS are causal or tag true causal variants, remains to be elucidated. We propose a novel method for bridging this knowledge gap by fine-mapping short structural variants (SSVs) in and around GWAS-identified loci. SSV fine-mapping of loci associated with complex disorders such as schizophrenia, amyotrophic lateral sclerosis, and Alzheimer's disease has uncovered genetic risk markers, phenotypic variability between patients, new pathological mechanisms, and potential therapeutic targets. We analyze previous investigations' methods and propose utilizing an evaluation algorithm to prioritize 10 SSVs for each of the top two AN GWAS-identified loci followed by Sanger sequencing and fragment analysis via capillary electrophoresis to characterize these SSVs for case/control association studies. Success of previous SSV analyses in complex disorders and effective utilization of similar methodologies supports our proposed method. Furthermore, the structural and spatial properties of the 10 SSVs identified for each of the top two AN GWAS-associated loci, cell adhesion molecule 1 (CADM1) and NCK interacting protein with SH3 domain (NCKIPSD), are similar to previous studies. We propose SSV fine-mapping of AN-associated loci will identify causal genetic architecture. Deepening understandings of AN may lead to novel therapeutic targets and subsequently increase quality-of-life for individuals living with the illness. PUBLIC SIGNIFICANCE STATEMENT: Anorexia nervosa is a severe and complex illness, arising from a combination of environmental and genetic factors. Recent studies estimate the contribution of genetic variability; however, the specific DNA sequences and how they contribute remain unknown. We present a novel approach, arguing that the genetic variant class, short structural variants, could answer this knowledge gap and allow development of biologically targeted therapeutics, improving quality-of-life and patient outcomes for affected individuals.
Collapse
Affiliation(s)
- Natasha Berthold
- School of Nursing, Midwifery, Health Sciences & PhysiotherapyUniversity of Notre Dame AustraliaFremantleWestern AustraliaAustralia
- Perron Institute for Neurological and Translational ScienceNedlandsWestern AustraliaAustralia
- School of Human Sciences, University of Western AustraliaCrawleyWestern AustraliaAustralia
| | - Julia Pytte
- Perron Institute for Neurological and Translational ScienceNedlandsWestern AustraliaAustralia
- School of Human Sciences, University of Western AustraliaCrawleyWestern AustraliaAustralia
| | - Cynthia M. Bulik
- Department of Medical Epidemiology and BiostatisticsKarolinska InstitutetStockholmSweden
- Department of PsychiatryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
- Department of NutritionUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Monika Tschochner
- School of Nursing, Midwifery, Health Sciences & PhysiotherapyUniversity of Notre Dame AustraliaFremantleWestern AustraliaAustralia
| | - Sarah E. Medland
- QIMR Berghofer Medical Research InstituteBrisbaneQueenslandAustralia
| | - Patrick Anthony Akkari
- Perron Institute for Neurological and Translational ScienceNedlandsWestern AustraliaAustralia
- Centre for Molecular Medicine and Innovative TherapeuticsMurdoch UniversityPerthWestern AustraliaAustralia
- Centre for Neuromuscular and Neurological DisordersUniversity of Western AustraliaNedlandsWestern AustraliaAustralia
- Department of NeurologyDuke UniversityDurhamNorth Carolina
| |
Collapse
|