1
|
Peña-Martínez EG, Rodríguez-Martínez JA. Decoding Non-coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases. Front Biosci (Schol Ed) 2024; 16:4. [PMID: 38538340 DOI: 10.31083/j.fbs1601004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/23/2023] [Accepted: 01/02/2024] [Indexed: 04/19/2024]
Abstract
Genome-wide association studies (GWAS) have mapped over 90% of disease- and quantitative-trait-associated variants within the non-coding genome. Non-coding regulatory DNA (e.g., promoters and enhancers) and RNA (e.g., 5' and 3' UTRs and splice sites) are essential in regulating temporal and tissue-specific gene expressions. Non-coding variants can potentially impact the phenotype of an organism by altering the molecular recognition of the cis-regulatory elements, leading to gene dysregulation. However, determining causality between non-coding variants, gene regulation, and human disease has remained challenging. Experimental and computational methods have been developed to understand the molecular mechanism involved in non-coding variant interference at the transcriptional and post-transcriptional levels. This review discusses recent approaches to evaluating disease-associated single-nucleotide variants (SNVs) and determines their impact on transcription factor (TF) binding, gene expression, chromatin conformation, post-transcriptional regulation, and translation.
Collapse
Affiliation(s)
- Edwin G Peña-Martínez
- Department of Biology, University of Puerto Rico-Río Piedras, 00931 San Juan, Puerto Rico
| | | |
Collapse
|
2
|
Han D, Li Y, Wang L, Liang X, Miao Y, Li W, Wang S, Wang Z. Comparative analysis of models in predicting the effects of SNPs on TF-DNA binding using large-scale in vitro and in vivo data. Brief Bioinform 2024; 25:bbae110. [PMID: 38517697 PMCID: PMC10959158 DOI: 10.1093/bib/bbae110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/22/2024] [Accepted: 02/26/2024] [Indexed: 03/24/2024] Open
Abstract
Non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.
Collapse
Affiliation(s)
- Dongmei Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Yurun Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Linxiao Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Xuan Liang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Yuanyuan Miao
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Wenran Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Sijia Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Zhen Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| |
Collapse
|
3
|
McAfee JC, Lee S, Lee J, Bell JL, Krupa O, Davis J, Insigne K, Bond ML, Zhao N, Boyle AP, Phanstiel DH, Love MI, Stein JL, Ruzicka WB, Davila-Velderrain J, Kosuri S, Won H. Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. Cell Genom 2023; 3:100404. [PMID: 37868037 PMCID: PMC10589626 DOI: 10.1016/j.xgen.2023.100404] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 02/23/2023] [Accepted: 08/21/2023] [Indexed: 10/24/2023]
Abstract
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We performed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.
Collapse
Affiliation(s)
- Jessica C. McAfee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Sool Lee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jiseok Lee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Oleh Krupa
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jessica Davis
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Quantitative and Computational Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kimberly Insigne
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Quantitative and Computational Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Marielle L. Bond
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Nanxiang Zhao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alan P. Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Douglas H. Phanstiel
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I. Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jason L. Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - W. Brad Ruzicka
- Laboratory for Epigenomics in Human Psychopathology, McLean Hospital, Belmont, MA 02141, USA
- Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Quantitative and Computational Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
4
|
Peña-Martínez EG, Pomales-Matos DA, Rivera-Madera A, Messon-Bird JL, Medina-Feliciano JG, Sanabria-Alberto L, Barreiro-Rosario AC, Rodriguez-Rios JM, Rodríguez-Martínez JA. Prioritizing Cardiovascular Disease-Associated Variants Altering NKX2-5 Binding through an Integrative Computational Approach. medRxiv 2023:2023.09.01.23294951. [PMID: 37693486 PMCID: PMC10491373 DOI: 10.1101/2023.09.01.23294951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Cardiovascular diseases (CVDs) are the leading cause of death worldwide and are heavily influenced by genetic factors. Genome-wide association studies (GWAS) have mapped > 90% of CVD-associated variants within the non-coding genome, which can alter the function of regulatory proteins, like transcription factors (TFs). However, due to the overwhelming number of GWAS single nucleotide polymorphisms (SNPs) (>500,000), prioritizing variants for in vitro analysis remains challenging. In this work, we implemented a computational approach that considers support vector machine (SVM)-based TF binding site classification and cardiac expression quantitative trait loci (eQTL) analysis to identify and prioritize potential CVD-causing SNPs. We identified 1,535 CVD-associated SNPs that occur within human heart footprints/enhancers and 9,309 variants in linkage disequilibrium (LD) with differential gene expression profiles in cardiac tissue. Using hiPSC-CM ChIP-seq data from NKX2-5 and TBX5, two cardiac TFs essential for proper heart development, we trained a large-scale gapped k-mer SVM (LS-GKM-SVM) predictive model that can identify binding sites altered by CVD-associated SNPs. The computational predictive model was tested by scoring human heart footprints and enhancers in vitro through electrophoretic mobility shift assay (EMSA). Three variants (rs59310144, rs6715570, and rs61872084) were prioritized for in vitro validation based on their eQTL in cardiac tissue and LS-GKM-SVM prediction to alter NKX2-5 DNA binding. All three variants altered NKX2-5 DNA binding. In summary, we present a bioinformatic approach that considers tissue-specific eQTL analysis and SVM-based TF binding site classification to prioritize CVD-associated variants for in vitro experimental analysis.
Collapse
|
5
|
Ma W, Fu Y, Bao Y, Wang Z, Lei B, Zheng W, Wang C, Liu Y. DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants. Int J Mol Sci 2023; 24:12023. [PMID: 37569400 PMCID: PMC10418434 DOI: 10.3390/ijms241512023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/13/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performance when applied to other species. Furthermore, the limited exploration of many species, particularly in the case of livestock, has led to a scarcity of comprehensive and high-quality epigenetic data, posing challenges in developing reliable deep learning models for decoding their non-coding genomes. The cross-species prediction of the regulatory genome can be achieved by leveraging publicly available data from extensively studied organisms and making use of the conserved DNA binding preferences of transcription factors within the same tissue. In this study, we introduced DeepSATA, a novel deep learning-based sequence analyzer that incorporates the transcription factor binding affinity for the cross-species prediction of chromatin accessibility. By applying DeepSATA to analyze the genomes of pigs, chickens, cattle, humans, and mice, we demonstrated its ability to improve the prediction accuracy of chromatin accessibility and achieve reliable cross-species predictions in animals. Additionally, we showcased its effectiveness in analyzing pig genetic variants associated with economic traits and in increasing the accuracy of genomic predictions. Overall, our study presents a valuable tool to explore the epigenomic landscape of various species and pinpoint regulatory deoxyribonucleic acid (DNA) variants associated with complex traits.
Collapse
Affiliation(s)
- Wenlong Ma
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yang Fu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yongzhou Bao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- School of Life Sciences, Henan University, Kaifeng 475004, China
| | - Zhen Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- School of Life Sciences, Henan University, Kaifeng 475004, China
| | - Bowen Lei
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Weigang Zheng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Chao Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
6
|
Heshmatzad K, Naderi N, Maleki M, Abbasi S, Ghasemi S, Ashrafi N, Fazelifar AF, Mahdavi M, Kalayinia S. Role of non-coding variants in cardiovascular disease. J Cell Mol Med 2023; 27:1621-1636. [PMID: 37183561 PMCID: PMC10273088 DOI: 10.1111/jcmm.17762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/29/2023] [Accepted: 04/25/2023] [Indexed: 05/16/2023] Open
Abstract
Cardiovascular diseases (CVDs) constitute one of the significant causes of death worldwide. Different pathological states are linked to CVDs, which despite interventions and treatments, still have poor prognoses. The genetic component, as a beneficial tool in the risk stratification of CVD development, plays a role in the pathogenesis of this group of diseases. The emergence of genome-wide association studies (GWAS) have led to the identification of non-coding parts associated with cardiovascular traits and disorders. Variants located in functional non-coding regions, including promoters/enhancers, introns, miRNAs and 5'/3' UTRs, account for 90% of all identified single-nucleotide polymorphisms associated with CVDs. Here, for the first time, we conducted a comprehensive review on the reported non-coding variants for different CVDs, including hypercholesterolemia, cardiomyopathies, congenital heart diseases, thoracic aortic aneurysms/dissections and coronary artery diseases. Additionally, we present the most commonly reported genes involved in each CVD. In total, 1469 non-coding variants constitute most reports on familial hypercholesterolemia, hypertrophic cardiomyopathy and dilated cardiomyopathy. The application and identification of non-coding variants are beneficial for the genetic diagnosis and better therapeutic management of CVDs.
Collapse
Affiliation(s)
- Katayoun Heshmatzad
- Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Niloofar Naderi
- Cardiogenetic Research Center, Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Majid Maleki
- Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Shiva Abbasi
- Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Serwa Ghasemi
- Cardiogenetic Research Center, Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Nooshin Ashrafi
- Cardiogenetic Research Center, Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Amir Farjam Fazelifar
- Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Mohammad Mahdavi
- Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| | - Samira Kalayinia
- Cardiogenetic Research Center, Rajaie Cardiovascular Medical and Research CenterIran University of Medical SciencesTehranIran
| |
Collapse
|
7
|
Shu L, Maroilley T, Tarailo-Graovac M. The Power of Clinical Diagnosis for Deciphering Complex Genetic Mechanisms in Rare Diseases. Genes (Basel) 2023; 14. [PMID: 36672937 DOI: 10.3390/genes14010196] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/05/2023] [Accepted: 01/09/2023] [Indexed: 01/13/2023] Open
Abstract
Complex genetic disease mechanisms, such as structural or non-coding variants, currently pose a substantial difficulty in frontline diagnostic tests. They thus may account for most unsolved rare disease patients regardless of the clinical phenotype. However, the clinical diagnosis can narrow the genetic focus to just a couple of genes for patients with well-established syndromes defined by prominent physical and/or unique biochemical phenotypes, allowing deeper analyses to consider complex genetic origin. Then, clinical-diagnosis-driven genome sequencing strategies may expedite the development of testing and analytical methods to account for complex disease mechanisms as well as to advance functional assays for the confirmation of complex variants, clinical management, and the development of new therapies.
Collapse
|
8
|
Connally NJ, Nazeen S, Lee D, Shi H, Stamatoyannopoulos J, Chun S, Cotsapas C, Cassa CA, Sunyaev SR. The missing link between genetic association and regulatory function. eLife 2022; 11:74970. [PMID: 36515579 PMCID: PMC9842386 DOI: 10.7554/elife.74970] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 12/02/2022] [Indexed: 12/15/2022] Open
Abstract
The genetic basis of most traits is highly polygenic and dominated by non-coding alleles. It is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite the availability of gene expression and epigenomic datasets, few variant-to-gene links have emerged. It is unclear whether these sparse results are due to limitations in available data and methods, or to deficiencies in the underlying assumed model. To better distinguish between these possibilities, we identified 220 gene-trait pairs in which protein-coding variants influence a complex trait or its Mendelian cognate. Despite the presence of expression quantitative trait loci near most GWAS associations, by applying a gene-based approach we found limited evidence that the baseline expression of trait-related genes explains GWAS associations, whether using colocalization methods (8% of genes implicated), transcription-wide association (2% of genes implicated), or a combination of regulatory annotations and distance (4% of genes implicated). These results contradict the hypothesis that most complex trait-associated variants coincide with homeostatic expression QTLs, suggesting that better models are needed. The field must confront this deficit and pursue this 'missing regulation.'
Collapse
Affiliation(s)
- Noah J Connally
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Sumaiya Nazeen
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Department of Neurology, Harvard Medical SchoolBostonUnited States
| | - Daniel Lee
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Huwenbo Shi
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
- Department of Epidemiology, Harvard T.H. Chan School of Public HealthBostonUnited States
| | | | - Sung Chun
- Division of Pulmonary Medicine, Boston Children’s HospitalBostonUnited States
| | - Chris Cotsapas
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
- Department of Neurology, Yale Medical SchoolNew HavenUnited States
- Department of Genetics, Yale Medical SchoolNew HavenUnited States
| | - Christopher A Cassa
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| |
Collapse
|
9
|
Chen L, Li MJ. Editorial: Deciphering Non-Coding Regulatory Variants: Computational and Functional Validation. Front Bioeng Biotechnol 2021; 9:769614. [PMID: 34805122 PMCID: PMC8595122 DOI: 10.3389/fbioe.2021.769614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/11/2021] [Indexed: 11/26/2022] Open
Affiliation(s)
- Li Chen
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Mulin Jun Li
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| |
Collapse
|
10
|
Guglielmi C, Scarpitta R, Gambino G, Conti E, Bellè F, Tancredi M, Cervelli T, Falaschi E, Cosini C, Aretini P, Congregati C, Marino M, Patruno M, Pilato B, Spina F, Balestrino L, Tenedini E, Carnevali I, Cortesi L, Tagliafico E, Tibiletti MG, Tommasi S, Ghilli M, Vivanet C, Galli A, Caligo MA. Detection of Germline Variants in 450 Breast/Ovarian Cancer Families with a Multi-Gene Panel Including Coding and Regulatory Regions. Int J Mol Sci 2021; 22:ijms22147693. [PMID: 34299313 PMCID: PMC8305371 DOI: 10.3390/ijms22147693] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 07/10/2021] [Accepted: 07/14/2021] [Indexed: 12/24/2022] Open
Abstract
With the progress of sequencing technologies, an ever-increasing number of variants of unknown functional and clinical significance (VUS) have been identified in both coding and non-coding regions of the main Breast Cancer (BC) predisposition genes. The aim of this study is to identify a mutational profile of coding and intron-exon junction regions of 12 moderate penetrance genes (ATM, BRIP1, CDH1, CHEK2, NBN, PALB2, PTEN, RAD50, RAD51C, RAD51D, STK11, TP53) in a cohort of 450 Italian patients with Hereditary Breast/Ovarian Cancer Syndrome, wild type for germline mutation in BRCA1/2 genes. The analysis was extended to 5′UTR and 3′UTR of all the genes listed above and to the BRCA1 and BRCA2 known regulatory regions in a subset of 120 patients. The screening was performed through NGS target resequencing on the Illumina platform MiSeq. 8.7% of the patients analyzed is carriers of class 5/4 coding variants in the ATM (3.6%), BRIP1 (1.6%), CHEK2 (1.8%), PALB2 (0.7%), RAD51C (0.4%), RAD51D (0.4%), and TP53 (0.2%) genes, while variants of uncertain pathological significance (VUSs)/class 3 were identified in 9.1% of the samples. In intron-exon junctions and in regulatory regions, variants were detected respectively in 5.1% and in 32.5% of the cases analyzed. The average age of disease onset of 44.4 in non-coding variant carriers is absolutely similar to the average age of disease onset in coding variant carriers for each proband’s group with the same cancer type. Furthermore, there is not a statistically significant difference in the proportion of cases with a tumor onset under age of 40 between the two groups, but the presence of multiple non-coding variants in the same patient may affect the aggressiveness of the tumor and it is worth underlining that 25% of patients with an aggressive tumor are carriers of a PTEN 3′UTR-variant. This data provides initial information on how important it might be to extend mutational screening to the regulatory regions in clinical practice.
Collapse
Affiliation(s)
- Chiara Guglielmi
- SOD Molecular Genetics, University Hospital of Pisa, 56126 Pisa, Italy; (C.G.); (E.C.); (M.T.); (E.F.); (C.C.)
| | - Rosa Scarpitta
- Division of Pathology, University of Pisa, 56126 Pisa, Italy;
| | - Gaetana Gambino
- Department of Clinical and Experimental Medicine, University of Pisa, 56126 Pisa, Italy;
| | - Eleonora Conti
- SOD Molecular Genetics, University Hospital of Pisa, 56126 Pisa, Italy; (C.G.); (E.C.); (M.T.); (E.F.); (C.C.)
| | - Francesca Bellè
- Functional Genetics and Genomics Laboratory, Institute of Clinical Physiology, IFC-CNR, 56127 Pisa, Italy; (F.B.); (T.C.)
| | - Mariella Tancredi
- SOD Molecular Genetics, University Hospital of Pisa, 56126 Pisa, Italy; (C.G.); (E.C.); (M.T.); (E.F.); (C.C.)
| | - Tiziana Cervelli
- Functional Genetics and Genomics Laboratory, Institute of Clinical Physiology, IFC-CNR, 56127 Pisa, Italy; (F.B.); (T.C.)
| | - Elisabetta Falaschi
- SOD Molecular Genetics, University Hospital of Pisa, 56126 Pisa, Italy; (C.G.); (E.C.); (M.T.); (E.F.); (C.C.)
| | - Cinzia Cosini
- SOD Molecular Genetics, University Hospital of Pisa, 56126 Pisa, Italy; (C.G.); (E.C.); (M.T.); (E.F.); (C.C.)
| | - Paolo Aretini
- Section of Oncological Genomics, Fondazione Pisana per la Scienza, 56017 Pisa, Italy;
| | - Caterina Congregati
- Division of Internal Medicine, University Hospital of Pisa, 56126 Pisa, Italy;
| | - Marco Marino
- Department of Life Sciences, University of Modena and Reggio Emilia, 41125 Modena, Italy; (M.M.); (E.T.); (E.T.)
| | - Margherita Patruno
- IRCCS Istituto Tumori “Giovanni Paolo II”, 70124 Bari, Italy; (M.P.); (B.P.); (S.T.)
| | - Brunella Pilato
- IRCCS Istituto Tumori “Giovanni Paolo II”, 70124 Bari, Italy; (M.P.); (B.P.); (S.T.)
| | - Francesca Spina
- SC Medical Genetics, ASSL Cagliari, 09126 Cagliari, Italy; (F.S.); (L.B.); (C.V.)
| | - Luisa Balestrino
- SC Medical Genetics, ASSL Cagliari, 09126 Cagliari, Italy; (F.S.); (L.B.); (C.V.)
| | - Elena Tenedini
- Department of Life Sciences, University of Modena and Reggio Emilia, 41125 Modena, Italy; (M.M.); (E.T.); (E.T.)
| | - Ileana Carnevali
- Ospedale di Circolo ASST Settelaghi, 21100 Varese, Italy; (I.C.); (M.G.T.)
| | - Laura Cortesi
- Department of Oncology, Haematology and Respiratory Diseases, University Hospital of Modena, 41124 Modena, Italy;
| | - Enrico Tagliafico
- Department of Life Sciences, University of Modena and Reggio Emilia, 41125 Modena, Italy; (M.M.); (E.T.); (E.T.)
| | | | - Stefania Tommasi
- IRCCS Istituto Tumori “Giovanni Paolo II”, 70124 Bari, Italy; (M.P.); (B.P.); (S.T.)
| | - Matteo Ghilli
- Breast Cancer Center, University Hospital, 56126 Pisa, Italy;
| | - Caterina Vivanet
- SC Medical Genetics, ASSL Cagliari, 09126 Cagliari, Italy; (F.S.); (L.B.); (C.V.)
| | - Alvaro Galli
- Functional Genetics and Genomics Laboratory, Institute of Clinical Physiology, IFC-CNR, 56127 Pisa, Italy; (F.B.); (T.C.)
- Correspondence: (A.G.); (M.A.C.)
| | - Maria Adelaide Caligo
- SOD Molecular Genetics, University Hospital of Pisa, 56126 Pisa, Italy; (C.G.); (E.C.); (M.T.); (E.F.); (C.C.)
- Correspondence: (A.G.); (M.A.C.)
| |
Collapse
|
11
|
Donner I, Sipilä LJ, Plaketti RM, Kuosmanen A, Forsström L, Katainen R, Kuismin O, Aavikko M, Romsi P, Kariniemi J, Aaltonen LA. Next-generation sequencing in a large pedigree segregating visceral artery aneurysms suggests potential role of COL4A1/COL4A2 in disease etiology. Vascular 2021; 30:842-847. [PMID: 34281442 DOI: 10.1177/17085381211033157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Visceral artery aneurysms (VAAs) can be fatal if ruptured. Although a relatively rare incident, it holds a contemporary mortality rate of approximately 12%. VAAs have multiple possible causes, one of which is genetic predisposition. Here, we present a striking family with seven individuals affected by VAAs, and one individual affected by a visceral artery pseudoaneurysm. METHODS We exome sequenced the affected family members and the parents of the proband to find a possible underlying genetic defect. As exome sequencing did not reveal any feasible protein-coding variants, we combined whole-genome sequencing of two individuals with linkage analysis to find a plausible non-coding culprit variant. Variants were ranked by the deep learning framework DeepSEA. RESULTS Two of seven top-ranking variants, NC_000013.11:g.108154659C>T and NC_000013.11:g.110409638C>T, were found in all VAA-affected individuals, but not in the individual affected by the pseudoaneurysm. The second variant is in a candidate cis-regulatory element in the fourth intron of COL4A2, proximal to COL4A1. CONCLUSIONS As type IV collagens are essential for the stability and integrity of the vascular basement membrane and involved in vascular disease, we conclude that COL4A1 and COL4A2 are strong candidates for VAA susceptibility genes.
Collapse
Affiliation(s)
- Iikki Donner
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| | - Lauri J Sipilä
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| | - Roosa-Maria Plaketti
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| | - Anna Kuosmanen
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| | - Linda Forsström
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| | - Riku Katainen
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| | - Outi Kuismin
- Department of Clinical Genetics, 60664Oulu University Hospital, Oulu, Finland.,PEDEGO Research Unit, Medical Research Center Oulu, 60664Oulu University Hospitaland University of Oulu, Oulu, Finland
| | - Mervi Aavikko
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland.,Institute for Molecular Medicine Finland (FIMM), HiLIFE, 3835University of Helsinki, Helsinki, Finland
| | - Pekka Romsi
- Department of Vascular Surgery, 60664Oulu University Hospital, Oulu, Finland
| | - Juho Kariniemi
- Department of Radiology, 60664Oulu University Hospital, Oulu, Finland
| | - Lauri A Aaltonen
- Department of Medical and Clinical Genetics, Medicum, 3835University of Helsinki, Helsinki, Finland.,Genome-Scale Biology Research Program, Research Programs Unit, 3835University of Helsinki, Helsinki, Finland
| |
Collapse
|
12
|
Pérez-Agustín A, Pinsach-Abuin M, Pagans S. Role of Non-Coding Variants in Brugada Syndrome. Int J Mol Sci 2020; 21:ijms21228556. [PMID: 33202810 PMCID: PMC7698069 DOI: 10.3390/ijms21228556] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/09/2020] [Accepted: 11/10/2020] [Indexed: 12/15/2022] Open
Abstract
Brugada syndrome (BrS) is an inherited electrical heart disease associated with a high risk of sudden cardiac death (SCD). The genetic characterization of BrS has always been challenging. Although several cardiac ion channel genes have been associated with BrS, SCN5A is the only gene that presents definitive evidence for causality to be used for clinical diagnosis of BrS. However, more than 65% of diagnosed cases cannot be explained by variants in SCN5A or other genes. Therefore, in an important number of BrS cases, the underlying mechanisms are still elusive. Common variants, mostly located in non-coding regions, have emerged as potential modulators of the disease by affecting different regulatory mechanisms, including transcription factors (TFs), three-dimensional organization of the genome, or non-coding RNAs (ncRNAs). These common variants have been hypothesized to modulate the interindividual susceptibility of the disease, which could explain incomplete penetrance of BrS observed within families. Altogether, the study of both common and rare variants in parallel is becoming increasingly important to better understand the genetic basis underlying BrS. In this review, we aim to describe the challenges of studying non-coding variants associated with disease, re-examine the studies that have linked non-coding variants with BrS, and provide further evidence for the relevance of regulatory elements in understanding this cardiac disorder.
Collapse
Affiliation(s)
- Adrian Pérez-Agustín
- Department of Medical Sciences, School of Medicine, University of Girona, 17003 Girona, Spain;
- Biomedical Research Institute of Girona, 17190 Salt, Spain;
| | | | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, 17003 Girona, Spain;
- Biomedical Research Institute of Girona, 17190 Salt, Spain;
- Correspondence:
| |
Collapse
|
13
|
Lima Cunha D, Arno G, Corton M, Moosajee M. The Spectrum of PAX6 Mutations and Genotype-Phenotype Correlations in the Eye. Genes (Basel) 2019; 10:genes10121050. [PMID: 31861090 PMCID: PMC6947179 DOI: 10.3390/genes10121050] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 12/09/2019] [Accepted: 12/12/2019] [Indexed: 12/13/2022] Open
Abstract
The transcription factor PAX6 is essential in ocular development in vertebrates, being considered the master regulator of the eye. During eye development, it is essential for the correct patterning and formation of the multi-layered optic cup and it is involved in the developing lens and corneal epithelium. In adulthood, it is mostly expressed in cornea, iris, and lens. PAX6 is a dosage-sensitive gene and it is highly regulated by several elements located upstream, downstream, and within the gene. There are more than 500 different mutations described to affect PAX6 and its regulatory regions, the majority of which lead to PAX6 haploinsufficiency, causing several ocular and systemic abnormalities. Aniridia is an autosomal dominant disorder that is marked by the complete or partial absence of the iris, foveal hypoplasia, and nystagmus, and is caused by heterozygous PAX6 mutations. Other ocular abnormalities have also been associated with PAX6 changes, and genotype-phenotype correlations are emerging. This review will cover recent advancements in PAX6 regulation, particularly the role of several enhancers that are known to regulate PAX6 during eye development and disease. We will also present an updated overview of the mutation spectrum, where an increasing number of mutations in the non-coding regions have been reported. Novel genotype-phenotype correlations will also be discussed.
Collapse
Affiliation(s)
| | - Gavin Arno
- Institute of Ophthalmology, UCL, London EC1V 9EL, UK
- Moorfields Eye Hospital NHS Foundation Trust, London EC1V 2PD, UK
- Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3JH, UK
| | - Marta Corton
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital—Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER), 28029 Madrid, Spain
| | - Mariya Moosajee
- Institute of Ophthalmology, UCL, London EC1V 9EL, UK
- Moorfields Eye Hospital NHS Foundation Trust, London EC1V 2PD, UK
- Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3JH, UK
- Correspondence:
| |
Collapse
|
14
|
Santana Dos Santos E, Lallemand F, Burke L, Stoppa-Lyonnet D, Brown M, Caputo SM, Rouleau E. Non-Coding Variants in BRCA1 and BRCA2 Genes: Potential Impact on Breast and Ovarian Cancer Predisposition. Cancers (Basel) 2018; 10:E453. [PMID: 30453575 DOI: 10.3390/cancers10110453] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 11/04/2018] [Accepted: 11/12/2018] [Indexed: 12/21/2022] Open
Abstract
BRCA1 and BRCA2 are major breast cancer susceptibility genes whose pathogenic variants are associated with a significant increase in the risk of breast and ovarian cancers. Current genetic screening is generally limited to BRCA1/2 exons and intron/exon boundaries. Most identified pathogenic variants cause the partial or complete loss of function of the protein. However, it is becoming increasingly clear that variants in these regions only account for a small proportion of cancer risk. The role of variants in non-coding regions beyond splice donor and acceptor sites, including those that have no qualitative effect on the protein, has not been thoroughly investigated. The key transcriptional regulatory elements of BRCA1 and BRCA2 are housed in gene promoters, untranslated regions, introns, and long-range elements. Within these sequences, germline and somatic variants have been described, but the clinical significance of the majority is currently unknown and it remains a significant clinical challenge. This review summarizes the available data on the impact of variants on non-coding regions of BRCA1/2 genes and their role on breast and ovarian cancer predisposition.
Collapse
|
15
|
Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018; 137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]
Abstract
Over a decade of genome-wide association, studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here, we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much-needed future research.
Collapse
Affiliation(s)
- Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA.
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Christian Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Department of Life Sciences, Harvard University, Cambridge, MA, USA
| | - Xihao Li
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian Wee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
| | - Tushar Dwivedi
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Mark Daly
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| |
Collapse
|
16
|
Zhu Y, Tazearslan C, Suh Y. Challenges and progress in interpretation of non-coding genetic variants associated with human disease. Exp Biol Med (Maywood) 2017; 242:1325-1334. [PMID: 28581336 DOI: 10.1177/1535370217713750] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.
Collapse
Affiliation(s)
- Yizhou Zhu
- 1 Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Cagdas Tazearslan
- 1 Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Yousin Suh
- 1 Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA.,2 Department of Ophthalmology & Visual Sciences, Albert Einstein College of Medicine, Bronx, NY 10461, USA.,3 Department of Medicine, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
17
|
Zhao J, Li D, Seo J, Allen AS, Gordân R. Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding. Res Comput Mol Biol 2017; 10229:336-352. [PMID: 28691125 DOI: 10.1007/978-3-319-56970-3_21] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Many recent studies have emphasized the importance of genetic variants and mutations in cancer and other complex human diseases. The overwhelming majority of these variants occur in non-coding portions of the genome, where they can have a functional impact by disrupting regulatory interactions between transcription factors (TFs) and DNA. Here, we present a method for assessing the impact of non-coding mutations on TF-DNA interactions, based on regression models of DNA-binding specificity trained on high-throughput in vitro data. We use ordinary least squares (OLS) to estimate the parameters of the binding model for each TF, and we show that our predictions of TF-binding changes due to DNA mutations correlate well with measured changes in gene expression. In addition, by leveraging distributional results associated with OLS estimation, for each predicted change in TF binding we also compute a normalized score (z-score) and a significance value (p-value) reflecting our confidence that the mutation affects TF binding. We use this approach to analyze a large set of pathogenic non-coding variants, and we show that these variants lead to significant differences in TF binding between alleles, compared to a control set of common variants. Thus, our results indicate that there is a strong regulatory component to the pathogenic non-coding variants identified thus far.
Collapse
Affiliation(s)
- Jingkang Zhao
- Center for Genomic and Computational Biology, Duke University, Durham NC 27708, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham NC 27708, USA
| | - Dongshunyi Li
- Department of Biostatistics and Bioinformatics, Duke University, Durham NC 27708, USA
| | - Jungkyun Seo
- Program in Computational Biology and Bioinformatics, Duke University, Durham NC 27708, USA
| | - Andrew S Allen
- Center for Genomic and Computational Biology, Duke University, Durham NC 27708, USA.,Department of Biostatistics and Bioinformatics, Duke University, Durham NC 27708, USA
| | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University, Durham NC 27708, USA.,Department of Biostatistics and Bioinformatics, Duke University, Durham NC 27708, USA.,Department of Computer Science, Duke University, Durham NC 27708, USA
| |
Collapse
|
18
|
Abstract
Evolutionary conserved transcription factor SOX9, encoded by the dosage sensitive SOX9 gene on chromosome 17q24.3, plays an important role in development of multiple organs, including bones and testes. Heterozygous point mutations and genomic copy-number variant (CNV) deletions involving SOX9 have been reported in patients with campomelic dysplasia (CD), a skeletal malformation syndrome often associated with male-to-female sex reversal. Balanced and unbalanced structural genomic variants with breakpoints mapping up to 1.3 Mb up- and downstream to SOX9 have been described in patients with milder phenotypes, including acampomelic campomelic dysplasia, sex reversal, and Pierre Robin sequence. Based on the localization of breakpoints of genomic rearrangements causing different phenotypes, 5 genomic intervals mapping upstream to SOX9 have been defined. We have analyzed the publically available database of high-throughput chromosome conformation capture (Hi-C) in multiple cell lines in the genomic regions flanking SOX9. Consistent with the literature data, chromatin domain boundaries in the SOX9 locus exhibit conservation across species and remain largely constant across multiple cell types. Interestingly, we have found that chromatin folding domains in the SOX9 locus associate with the genomic intervals harboring real and putative regulatory elements of SOX9, implicating that variation in intra-domain interactions may be critical for dynamic regulation of SOX9 expression in a cell type-specific fashion. We propose that tissue-specific enhancers for other transcription factor genes may similarly utilize chromatin folding sub-domains in gene regulation.
Collapse
Affiliation(s)
- Marta Smyk
- a Department of Medical Genetics , Institute of Mother and Child , Warsaw , Poland
| | - Kadir Caner Akdemir
- b Genomic Medicine Department , MD Anderson Cancer Center , Houston , TX , USA
| | - Paweł Stankiewicz
- c Department of Molecular and Human Genetics , Baylor College of Medicine , Houston , TX , USA
| |
Collapse
|
19
|
Hoffmann A, Ziller M, Spengler D. The Future is The Past: Methylation QTLs in Schizophrenia. Genes (Basel) 2016; 7:E104. [PMID: 27886132 DOI: 10.3390/genes7120104] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/13/2016] [Accepted: 11/16/2016] [Indexed: 12/12/2022] Open
Abstract
Genome-wide association studies (GWAS) have remarkably advanced insight into the genetic basis of schizophrenia (SCZ). Still, most of the functional variance in disease risk remains unexplained. Hence, there is a growing need to map genetic variability-to-genes-to-functions for understanding the pathophysiology of SCZ and the development of better treatments. Genetic variation can regulate various cellular functions including DNA methylation, an epigenetic mark with important roles in transcription and the mediation of environmental influences. Methylation quantitative trait loci (meQTLs) are derived by mapping levels of DNA methylation in genetically different, genotyped individuals and define loci at which DNA methylation is influenced by genetic variation. Recent evidence points to an abundance of meQTLs in brain tissues whose functional contributions to development and mental diseases are still poorly understood. Interestingly, fetal meQTLs reside in regulatory domains affecting methylome reconfiguration during early brain development and are enriched in loci identified by GWAS for SCZ. Moreover, fetal meQTLs are preserved in the adult brain and could trace early epigenomic deregulation during vulnerable periods. Overall, these findings highlight the role of fetal meQTLs in the genetic risk for and in the possible neurodevelopmental origin of SCZ.
Collapse
|