1
|
Clarkson C, Chen Z, Rocca C, Jadhav B, Ibañez K, Ryten M, Sharp AJ, Houlden H, Tucci A. A Population-Wide Exploration of the THAP11 CAG Repeat Size and Structure in the 100,000 Genomes Project and UK Biobank. Mov Disord 2025; 40:561-566. [PMID: 39651830 PMCID: PMC11926500 DOI: 10.1002/mds.30073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 10/23/2024] [Accepted: 11/14/2024] [Indexed: 03/22/2025] Open
Abstract
BACKGROUND A CAG repeat expansion in THAP11 was recently found to be associated with spinocerebellar ataxia in two Chinese families. Expanded repeats ranged from 45 to 100 units, with CAA sequence interruptions in the 5' region and an uninterrupted CAG tract in the 3' tail. OBJECTIVE Here, we assess the population distribution of the THAP11 repeat, and its contribution to neurological diseases. METHODS We interrogated data from 54,788 individuals from Genomics England, 10,686 patients from the UCL Queen Square Institute of Neurology in-house database (UCL IoN), and 424,340 individuals from the UK Biobank. RESULTS We identified expanded repeats in four individuals with learning difficulties without ataxia and in three individuals in UK Biobank, one with hereditary ataxia, one with hereditary neuropathy, and one with neurodegenerative disease. We showed a linear relationship between the number of CAA interruptions and overall repeat length. CONCLUSIONS These results indicate that THAP11 expansions are rare in the British population and that sequence structures predisposed to expansions may be more common in non-British ancestries. © 2024 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Chris Clarkson
- Clinical Pharmacology and Precision Medicine, William Harvey Research Institute, School of Medicine and DentistryQueen Mary University of LondonLondonUK
- Department of Neuromuscular DiseasesUCL Queen Square Institute of NeurologyLondonUK
| | - Zhongbo Chen
- Department of Clinical and Movement NeuroscienceUCL Queen Square Institute of NeurologyLondonUK
- The Francis Crick InstituteLondonUK
| | - Clarissa Rocca
- Department of Neuromuscular DiseasesUCL Queen Square Institute of NeurologyLondonUK
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Kristina Ibañez
- Clinical Pharmacology and Precision Medicine, William Harvey Research Institute, School of Medicine and DentistryQueen Mary University of LondonLondonUK
| | - Mina Ryten
- UK Dementia Research Institute at CambridgeUniversity of CambridgeCambridgeUK
| | - Andrew J. Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Henry Houlden
- Department of Neuromuscular DiseasesUCL Queen Square Institute of NeurologyLondonUK
| | - Arianna Tucci
- Clinical Pharmacology and Precision Medicine, William Harvey Research Institute, School of Medicine and DentistryQueen Mary University of LondonLondonUK
- Department of Neurodegenerative DiseaseUCL Queen Square Institute of NeurologyLondonUK
| |
Collapse
|
2
|
Zhang S, Song Q, Zhang P, Wang X, Guo R, Li Y, Liu S, Yan X, Zhang J, Niu Y, Shi Y, Song T, Xu T, He S. Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits. CELL GENOMICS 2024; 4:100699. [PMID: 39609246 DOI: 10.1016/j.xgen.2024.100699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/31/2024] [Accepted: 11/01/2024] [Indexed: 11/30/2024]
Abstract
Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown. Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR length polymorphisms (VNTR-LPs) and over 11 million VNTR motif polymorphisms (VNTR-MPs) detected in 8,222 high-coverage genomes. Leveraging the large-scale NyuWa cohort, we identified 2,982,456 (31.8%) NyuWa-specific VNTR-MPs, of which 95.3% were rare. Moreover, we found 1,937 out of 38,685 VNTRs that were associated with gene expression through VNTR-MPs in lymphoblastoid cell lines. Specifically, we clarified that the expansion of a likely causal motif could upregulate gene expression by improving the binding concentration of PU.1. We also explored the potential impacts of VNTR polymorphisms on phenotypic differentiation and disease susceptibility. This study expands our knowledge of VNTR-MPs and their functional implications.
Collapse
Affiliation(s)
- Sijia Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Department of Scientific Research, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Qiao Song
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peng Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaona Wang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Rong Guo
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yanyan Li
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shuai Liu
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaoyu Yan
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingjing Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yiwei Niu
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yirong Shi
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tingrui Song
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, Shandong 250117, China.
| | - Shunmin He
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
3
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Altman G, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. Nat Commun 2024; 15:10521. [PMID: 39627187 PMCID: PMC11614882 DOI: 10.1038/s41467-024-54678-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 11/18/2024] [Indexed: 12/06/2024] Open
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we perform direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing fine-mapped associations with 73 traits. We replicate 23 of 31 (74%) of these associations in the All of Us cohort. While this set includes several known repeat expansion disorders, novel associations we found are attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Fine-mapped TRs are strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
Affiliation(s)
- Celine A Manigbas
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Mariya Shadrina
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - William Lee
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Gabrielle Altman
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA.
| |
Collapse
|
4
|
Jadhav B, Garg P, van Vugt JJFA, Ibanez K, Gagliardi D, Lee W, Shadrina M, Mokveld T, Dolzhenko E, Martin-Trujillo A, Gies SJ, Altman G, Rocca C, Barbosa M, Jain M, Lahiri N, Lachlan K, Houlden H, Paten B, Veldink J, Tucci A, Sharp AJ. A phenome-wide association study of methylated GC-rich repeats identifies a GCC repeat expansion in AFF3 associated with intellectual disability. Nat Genet 2024; 56:2322-2332. [PMID: 39313615 PMCID: PMC11560504 DOI: 10.1038/s41588-024-01917-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/20/2024] [Indexed: 09/25/2024]
Abstract
GC-rich tandem repeat expansions (TREs) are often associated with DNA methylation, gene silencing and folate-sensitive fragile sites, and underlie several congenital and late-onset disorders. Through a combination of DNA-methylation profiling and tandem repeat genotyping, we identified 24 methylated TREs and investigated their effects on human traits using phenome-wide association studies in 168,641 individuals from the UK Biobank, identifying 156 significant TRE-trait associations involving 17 different TREs. Of these, a GCC expansion in the promoter of AFF3 was associated with a 2.4-fold reduced probability of completing secondary education, an effect size comparable to several recurrent pathogenic microdeletions. In a cohort of 6,371 probands with neurodevelopmental problems of suspected genetic etiology, we observed a significant enrichment of AFF3 expansions compared with controls. With a population prevalence that is at least fivefold higher than the TRE that causes fragile X syndrome, AFF3 expansions represent a major cause of neurodevelopmental delay.
Collapse
Affiliation(s)
- Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joke J F A van Vugt
- Department of Neurology, UMC Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - Kristina Ibanez
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Delia Gagliardi
- William Harvey Research Institute, Queen Mary University of London, London, UK
- Department of Neuromuscular Diseases, Institute of Neurology, University College London, London, UK
| | - William Lee
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mariya Shadrina
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | - Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Scott J Gies
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gabrielle Altman
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Clarissa Rocca
- Department of Neuromuscular Diseases, Institute of Neurology, University College London, London, UK
| | - Mafalda Barbosa
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Northeastern University, Boston, MA, USA
| | - Nayana Lahiri
- SW Thames Centre for Genomics, St George's University of London & St George's University Hospitals NHS, London, UK
| | - Katherine Lachlan
- Wessex Clinical Genetics Service, University Hospital Southampton NHS Trust and Department of Human Genetics and Genomic Medicine, Southampton University, Southampton, UK
| | - Henry Houlden
- Department of Neuromuscular Diseases, Institute of Neurology, University College London, London, UK
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jan Veldink
- Department of Neurology, UMC Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - Arianna Tucci
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
5
|
TRGT-ing the dark genome to accurately characterize tandem repeats at scale. Nat Biotechnol 2024; 42:1504-1505. [PMID: 38168998 DOI: 10.1038/s41587-023-02073-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
|
6
|
Betschart RO, Riccio C, Aguilera-Garcia D, Blankenberg S, Guo L, Moch H, Seidl D, Solleder H, Thalén F, Thiéry A, Twerenbold R, Zeller T, Zoche M, Ziegler A. Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control. Biom J 2024; 66:e202300278. [PMID: 38988195 DOI: 10.1002/bimj.202300278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 07/12/2024]
Abstract
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
Collapse
Affiliation(s)
| | | | - Domingo Aguilera-Garcia
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Stefan Blankenberg
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Linlin Guo
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Holger Moch
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Dagmar Seidl
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Hugo Solleder
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | - Felix Thalén
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | | | - Raphael Twerenbold
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Tanja Zeller
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Martin Zoche
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
7
|
Moya R, Wang X, Tsien RW, Maurano MT. Structural characterization of a polymorphic repeat at the CACNA1C schizophrenia locus. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303780. [PMID: 38798557 PMCID: PMC11118589 DOI: 10.1101/2024.03.05.24303780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Genetic variation within intron 3 of the CACNA1C calcium channel gene is associated with schizophrenia and bipolar disorder, but analysis of the causal variants and their effect is complicated by a nearby variable-number tandem repeat (VNTR). Here, we used 155 long-read genome assemblies from 78 diverse individuals to delineate the structure and population variability of the CACNA1C intron 3 VNTR. We categorized VNTR sequences into 7 Types of structural alleles using sequence differences among repeat units. Only 12 repeat units at the 5' end of the VNTR were shared across most Types, but several Types were related through a series of large and small duplications. The most diverged Types were rare and present only in individuals with African ancestry, but the multiallelic structural polymorphism Variable Region 2 was present across populations at different frequencies, consistent with expansion of the VNTR preceding the emergence of early hominins. VR2 was in complete linkage disequilibrium with fine-mapped schizophrenia variants (SNPs) from genome-wide association studies (GWAS). This risk haplotype was associated with decreased CACNA1C gene expression in brain tissues profiled by the GTEx project. Our work suggests that sequence variation within a human-specific VNTR affects gene expression, and provides a detailed characterization of new alleles at a flagship neuropsychiatric locus.
Collapse
Affiliation(s)
- Raquel Moya
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Xiaohan Wang
- Neuroscience Institute, NYU School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University, New York, NY 10016
| | - Richard W. Tsien
- Neuroscience Institute, NYU School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University, New York, NY 10016
| | - Matthew T. Maurano
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Department of Pathology, NYU School of Medicine, New York, NY 10016, USA
| |
Collapse
|
8
|
Ferreira T, Rodriguez S. Mitochondrial DNA: Inherent Complexities Relevant to Genetic Analyses. Genes (Basel) 2024; 15:617. [PMID: 38790246 PMCID: PMC11121663 DOI: 10.3390/genes15050617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
Mitochondrial DNA (mtDNA) exhibits distinct characteristics distinguishing it from the nuclear genome, necessitating specific analytical methods in genetic studies. This comprehensive review explores the complex role of mtDNA in a variety of genetic studies, including genome-wide, epigenome-wide, and phenome-wide association studies, with a focus on its implications for human traits and diseases. Here, we discuss the structure and gene-encoding properties of mtDNA, along with the influence of environmental factors and epigenetic modifications on its function and variability. Particularly significant are the challenges posed by mtDNA's high mutation rate, heteroplasmy, and copy number variations, and their impact on disease susceptibility and population genetic analyses. The review also highlights recent advances in methodological approaches that enhance our understanding of mtDNA associations, advocating for refined genetic research techniques that accommodate its complexities. By providing a comprehensive overview of the intricacies of mtDNA, this paper underscores the need for an integrated approach to genetic studies that considers the unique properties of mitochondrial genetics. Our findings aim to inform future research and encourage the development of innovative methodologies to better interpret the broad implications of mtDNA in human health and disease.
Collapse
Affiliation(s)
- Tomas Ferreira
- Bristol Medical School, University of Bristol, Bristol BS8 1UD, UK
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SL, UK
| | - Santiago Rodriguez
- Bristol Medical School, University of Bristol, Bristol BS8 1UD, UK
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 1QU, UK
| |
Collapse
|
9
|
Sureshkumar S, Bandaranayake C, Lv J, Dent CI, Bhagat PK, Mukherjee S, Sarwade R, Atri C, York HM, Tamizhselvan P, Shamaya N, Folini G, Bergey BG, Yadav AS, Kumar S, Grummisch OS, Saini P, Yadav RK, Arumugam S, Rosonina E, Sadanandom A, Liu H, Balasubramanian S. SUMO protease FUG1, histone reader AL3 and chromodomain protein LHP1 are integral to repeat expansion-induced gene silencing in Arabidopsis thaliana. NATURE PLANTS 2024; 10:749-759. [PMID: 38641663 DOI: 10.1038/s41477-024-01672-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 03/15/2024] [Indexed: 04/21/2024]
Abstract
Epigenetic gene silencing induced by expanded repeats can cause diverse phenotypes ranging from severe growth defects in plants to genetic diseases such as Friedreich's ataxia in humans. The molecular mechanisms underlying repeat expansion-induced epigenetic silencing remain largely unknown. Using a plant model with a temperature-sensitive phenotype, we have previously shown that expanded repeats can induce small RNAs, which in turn can lead to epigenetic silencing through the RNA-dependent DNA methylation pathway. Here, using a genetic suppressor screen and yeast two-hybrid assays, we identified novel components required for epigenetic silencing caused by expanded repeats. We show that FOURTH ULP GENE CLASS 1 (FUG1)-an uncharacterized SUMO protease with no known role in gene silencing-is required for epigenetic silencing caused by expanded repeats. In addition, we demonstrate that FUG1 physically interacts with ALFIN-LIKE 3 (AL3)-a histone reader that is known to bind to active histone mark H3K4me2/3. Loss of function of AL3 abolishes epigenetic silencing caused by expanded repeats. AL3 physically interacts with the chromodomain protein LIKE HETEROCHROMATIN 1 (LHP1)-known to be associated with the spread of the repressive histone mark H3K27me3 to cause repeat expansion-induced epigenetic silencing. Loss of any of these components suppresses repeat expansion-associated phenotypes coupled with an increase in IIL1 expression with the reversal of gene silencing and associated change in epigenetic marks. Our findings suggest that the FUG1-AL3-LHP1 module is essential to confer repeat expansion-associated epigenetic silencing and highlight the importance of post-translational modifiers and histone readers in epigenetic silencing.
Collapse
Affiliation(s)
- Sridevi Sureshkumar
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia.
| | - Champa Bandaranayake
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Junqing Lv
- National Key Laboratory of Plant Molecular Genetics, CAS Centre for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Craig I Dent
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | | | - Sourav Mukherjee
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Rucha Sarwade
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Chhaya Atri
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Harrison M York
- Monash Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
- European Molecular Biology Laboratory, Australia (EMBL Australia), Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Prashanth Tamizhselvan
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Nawar Shamaya
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Giulia Folini
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | | | - Avilash Singh Yadav
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Subhasree Kumar
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Oliver S Grummisch
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Prince Saini
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Ram K Yadav
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Senthil Arumugam
- Monash Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
- European Molecular Biology Laboratory, Australia (EMBL Australia), Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Emanuel Rosonina
- Department of Biology, York University, Toronto, Ontario, Canada
| | - Ari Sadanandom
- Department of Biosciences, Durham University, Durham, UK
| | - Hongtao Liu
- National Key Laboratory of Plant Molecular Genetics, CAS Centre for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
10
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.22.24301630. [PMID: 38343850 PMCID: PMC10854328 DOI: 10.1101/2024.01.22.24301630] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we performed direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing causal associations with 73 traits. We replicated 23 of 31 (74%) of these causal associations in the All of Us cohort. While this set included several known repeat expansion disorders, novel associations we found were attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Causal TRs were strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
|
11
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
12
|
Jadhav B, Garg P, van Vugt JJFA, Ibanez K, Gagliardi D, Lee W, Shadrina M, Mokveld T, Dolzhenko E, Martin-Trujillo A, Gies SL, Rocca C, Barbosa M, Jain M, Lahiri N, Lachlan K, Houlden H, Paten B, Veldink J, Tucci A, Sharp AJ. A phenome-wide association study of methylated GC-rich repeats identifies a GCC repeat expansion in AFF3 as a significant cause of intellectual disability. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.03.23289461. [PMID: 37205357 PMCID: PMC10187445 DOI: 10.1101/2023.05.03.23289461] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
GC-rich tandem repeat expansions (TREs) are often associated with DNA methylation, gene silencing and folate-sensitive fragile sites and underlie several congenital and late-onset disorders. Through a combination of DNA methylation profiling and tandem repeat genotyping, we identified 24 methylated TREs and investigated their effects on human traits using PheWAS in 168,641 individuals from the UK Biobank, identifying 156 significant TRE:trait associations involving 17 different TREs. Of these, a GCC expansion in the promoter of AFF3 was linked with a 2.4-fold reduced probability of completing secondary education, an effect size comparable to several recurrent pathogenic microdeletions. In a cohort of 6,371 probands with neurodevelopmental problems of suspected genetic etiology, we observed a significant enrichment of AFF3 expansions compared to controls. With a population prevalence that is at least 5-fold higher than the TRE that causes fragile X syndrome, AFF3 expansions represent a significant cause of neurodevelopmental delay.
Collapse
|
13
|
Mukamel RE, Handsaker RE, Sherman MA, Barton AR, Hujoel MLA, McCarroll SA, Loh PR. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell 2023; 186:3659-3673.e23. [PMID: 37527660 PMCID: PMC10528368 DOI: 10.1016/j.cell.2023.07.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 04/07/2023] [Accepted: 07/03/2023] [Indexed: 08/03/2023]
Abstract
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Collapse
Affiliation(s)
- Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Bioinformatics and Integrative Genomics Program, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
14
|
Samelak-Czajka A, Wojciechowski P, Marszalek-Zenczak M, Figlerowicz M, Zmienko A. Differences in the intraspecies copy number variation of Arabidopsis thaliana conserved and nonconserved miRNA genes. Funct Integr Genomics 2023; 23:120. [PMID: 37036577 PMCID: PMC10085913 DOI: 10.1007/s10142-023-01043-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 03/23/2023] [Accepted: 03/25/2023] [Indexed: 04/11/2023]
Abstract
MicroRNAs (miRNAs) regulate gene expression by RNA interference mechanism. In plants, miRNA genes (MIRs) which are grouped into conserved families, i.e. they are present among the different plant taxa, are involved in the regulation of many developmental and physiological processes. The roles of the nonconserved MIRs-which are MIRs restricted to one plant family, genus, or even species-are less recognized; however, many of them participate in the responses to biotic and abiotic stresses. Both over- and underproduction of miRNAs may influence various biological processes. Consequently, maintaining intracellular miRNA homeostasis seems to be crucial for the organism. Deletions and duplications in the genomic sequence may alter gene dosage and/or activity. We evaluated the extent of copy number variations (CNVs) among Arabidopsis thaliana (Arabidopsis) MIRs in over 1000 natural accessions, using population-based analysis of the short-read sequencing data. We showed that the conserved MIRs were unlikely to display CNVs and their deletions were extremely rare, whereas nonconserved MIRs presented moderate variation. Transposon-derived MIRs displayed exceptionally high diversity. Conversely, MIRs involved in the epigenetic control of transposons reactivated during development were mostly invariable. MIR overlap with the protein-coding genes also limited their variability. At the expression level, a higher rate of nonvariable, nonconserved miRNAs was detectable in Col-0 leaves, inflorescence, and siliques compared to nonconserved variable miRNAs, although the expression of both groups was much lower than that of the conserved MIRs. Our data indicate that CNV rate of Arabidopsis MIRs is related with their age, function, and genomic localization.
Collapse
Affiliation(s)
- Anna Samelak-Czajka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland
| | - Pawel Wojciechowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland
- Institute of Computing Science, Faculty of Computing and Telecommunications, Poznan University of Technology, 60-965, Poznan, Poland
| | | | - Marek Figlerowicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.
| | - Agnieszka Zmienko
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.
| |
Collapse
|
15
|
Target-allele-specific probe single-base extension (TASP-SBE): a novel MALDI-TOF-MS strategy for multi-variants analysis and its application in simultaneous detection of α-/β-thalassemia mutations. Hum Genet 2023; 142:445-456. [PMID: 36658365 DOI: 10.1007/s00439-023-02520-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 01/07/2023] [Indexed: 01/20/2023]
Abstract
Single-nucleotide variants (SNVs) and copy number variations (CNVs) are the most common genomic variations that cause phenotypic diversity and genetic disorders. MALDI-TOF-MS is a rapid and cost-effective technique for multi-variant genotyping, but it is challenging to efficiently detect CNVs and clustered SNVs, especially to simultaneously detect CNVs and SNVs in one reaction. Herein, a novel strategy termed Target-Allele-Specific Probe Single-Base Extension (TASP-SBE) was devised to efficiently detect CNVs and clustered SNVs with MALDI-TOF-MS. By comprehensive use of traditional SBE and TASP-SBE strategies, a MALDI-TOF-MS assay was also developed to simultaneously detect 28 α-/β-thalassemia mutations in a single reaction system, including 4 α-thalassemia deletions, 3 HBA and 21 HBB SNVs. The results showed that all 28 mutations were sensitively identified, and the CNVs of HBA/HBB genes were also accurately analyzed based on the ratio of peak height (RPH) between the target allele and reference gene. The double-blind evaluation results of 989 thalassemia carrier samples showed a 100% concordance of this assay with other methods. In conclusion, a one-tube MALDI-TOF-MS assay was developed to simultaneously genotype 28 thalassemia mutations. This novel TASP-SBE was also verified a practicable strategy for the detection of CNVs and clustered SNVs, providing a feasible approach for multi-variants analysis with MALDI-TOF-MS technique.
Collapse
|
16
|
Pokrovac I, Pezer Ž. Recent advances and current challenges in population genomics of structural variation in animals and plants. Front Genet 2022; 13:1060898. [PMID: 36523759 PMCID: PMC9745067 DOI: 10.3389/fgene.2022.1060898] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/15/2022] [Indexed: 05/02/2024] Open
Abstract
The field of population genomics has seen a surge of studies on genomic structural variation over the past two decades. These studies witnessed that structural variation is taxonomically ubiquitous and represent a dominant form of genetic variation within species. Recent advances in technology, especially the development of long-read sequencing platforms, have enabled the discovery of structural variants (SVs) in previously inaccessible genomic regions which unlocked additional structural variation for population studies and revealed that more SVs contribute to evolution than previously perceived. An increasing number of studies suggest that SVs of all types and sizes may have a large effect on phenotype and consequently major impact on rapid adaptation, population divergence, and speciation. However, the functional effect of the vast majority of SVs is unknown and the field generally lacks evidence on the phenotypic consequences of most SVs that are suggested to have adaptive potential. Non-human genomes are heavily under-represented in population-scale studies of SVs. We argue that more research on other species is needed to objectively estimate the contribution of SVs to evolution. We discuss technical challenges associated with SV detection and outline the most recent advances towards more representative reference genomes, which opens a new era in population-scale studies of structural variation.
Collapse
Affiliation(s)
| | - Željka Pezer
- Laboratory for Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| |
Collapse
|