1
|
Wu K, Bu F, Wu Y, Zhang G, Wang X, He S, Liu MF, Chen R, Yuan H. Exploring noncoding variants in genetic diseases: from detection to functional insights. J Genet Genomics 2024; 51:111-132. [PMID: 38181897 DOI: 10.1016/j.jgg.2024.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/26/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
Previous studies on genetic diseases predominantly focused on protein-coding variations, overlooking the vast noncoding regions in the human genome. The development of high-throughput sequencing technologies and functional genomics tools has enabled the systematic identification of functional noncoding variants. These variants can impact gene expression, regulation, and chromatin conformation, thereby contributing to disease pathogenesis. Understanding the mechanisms that underlie the impact of noncoding variants on genetic diseases is indispensable for the development of precisely targeted therapies and the implementation of personalized medicine strategies. The intricacies of noncoding regions introduce a multitude of challenges and research opportunities. In this review, we introduce a spectrum of noncoding variants involved in genetic diseases, along with research strategies and advanced technologies for their precise identification and in-depth understanding of the complexity of the noncoding genome. We will delve into the research challenges and propose potential solutions for unraveling the genetic basis of rare and complex diseases.
Collapse
Affiliation(s)
- Ke Wu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Yang Wu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Gen Zhang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Xin Wang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, Zhejiang 310024, China
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mo-Fang Liu
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, Zhejiang 310024, China; State Key Laboratory of Molecular Biology, State Key Laboratory of Cell Biology, Shanghai Key Laboratory of Molecular Andrology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Runsheng Chen
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China.
| |
Collapse
|
2
|
Hill C, Hudaiberdiev S, Ovcharenko I. ChromDL: a next-generation regulatory DNA classifier. Bioinformatics 2023; 39:i377-i385. [PMID: 37387183 PMCID: PMC10311331 DOI: 10.1093/bioinformatics/btad217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine-learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA. RESULTS Using a comparative analysis of the performance of thousands of Deep Learning architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units, convolutional neural networks, and bidirectional long short-term memory units, which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site, histone modification, and DNase-I hyper-sensitive site detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor binding as compared to previously developed methods and has the potential to help delineate transcription factor binding motif specificities. AVAILABILITY AND IMPLEMENTATION The ChromDL source code can be found at https://github.com/chrishil1/ChromDL.
Collapse
Affiliation(s)
- Christopher Hill
- Computational Biology Branch, Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, United States
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Sanjarbek Hudaiberdiev
- Computational Biology Branch, Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, United States
| | - Ivan Ovcharenko
- Computational Biology Branch, Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, United States
| |
Collapse
|
3
|
Hill C, Hudaiberdiev S, Ovcharenko I. ChromDL: A Next-Generation Regulatory DNA Classifier. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.27.525971. [PMID: 36789431 PMCID: PMC9928050 DOI: 10.1101/2023.01.27.525971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Motivation Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA. Results Using a comparative analysis of the performance of thousands of Deep Learning (DL) architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units (BiGRU), convolutional neural networks (CNNs), and bidirectional long short-term memory units (BiLSTM), which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site (TFBS), histone modification (HM), and DNase-I hypersensitive site (DHS) detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor (TF) binding with higher accuracy as compared to previously developed methods and has the potential to accurately delineate TF binding motif specificities. Availability The ChromDL source code can be found at https://github.com/chrishil1/ChromDL .
Collapse
Affiliation(s)
- Christopher Hill
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Sanjarbek Hudaiberdiev
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
4
|
Mansour M, Giudice E, Xu X, Akarsu H, Bordes P, Guillet V, Bigot DJ, Slama N, D'urso G, Chat S, Redder P, Falquet L, Mourey L, Gillet R, Genevaux P. Substrate recognition and cryo-EM structure of the ribosome-bound TAC toxin of Mycobacterium tuberculosis. Nat Commun 2022; 13:2641. [PMID: 35552387 PMCID: PMC9098466 DOI: 10.1038/s41467-022-30373-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/27/2022] [Indexed: 11/16/2022] Open
Abstract
Toxins of toxin-antitoxin systems use diverse mechanisms to control bacterial growth. Here, we focus on the deleterious toxin of the atypical tripartite toxin-antitoxin-chaperone (TAC) system of Mycobacterium tuberculosis, whose inhibition requires the concerted action of the antitoxin and its dedicated SecB-like chaperone. We show that the TAC toxin is a bona fide ribonuclease and identify exact cleavage sites in mRNA targets on a transcriptome-wide scale in vivo. mRNA cleavage by the toxin occurs after the second nucleotide of the ribosomal A-site codon during translation, with a strong preference for CCA codons in vivo. Finally, we report the cryo-EM structure of the ribosome-bound TAC toxin in the presence of native M. tuberculosis cspA mRNA, revealing the specific mechanism by which the TAC toxin interacts with the ribosome and the tRNA in the P-site to cleave its mRNA target. Toxin-antitoxin systems are widespread in bacteria. Here the authors present structures of M. tuberculosis HigBTAC alone and bound to the ribosome in the presence of native cspA mRNA, shedding light on its mechanism of translation inhibition.
Collapse
Affiliation(s)
- Moise Mansour
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Emmanuel Giudice
- Institut de Génétique et Développement de Rennes (IGDR), UMR6290, Université de Rennes, CNRS, Rennes, France
| | - Xibing Xu
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Hatice Akarsu
- Department of Biology, University of Fribourg & Swiss Institute of Bioinformatics, Fribourg, Switzerland.,Institute of Veterinary Bacteriology, University of Bern, Bern, Switzerland
| | - Patricia Bordes
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Valérie Guillet
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Donna-Joe Bigot
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France.,Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Nawel Slama
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Gaetano D'urso
- Institut de Génétique et Développement de Rennes (IGDR), UMR6290, Université de Rennes, CNRS, Rennes, France
| | - Sophie Chat
- Institut de Génétique et Développement de Rennes (IGDR), UMR6290, Université de Rennes, CNRS, Rennes, France
| | - Peter Redder
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Laurent Falquet
- Department of Biology, University of Fribourg & Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Lionel Mourey
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Reynald Gillet
- Institut de Génétique et Développement de Rennes (IGDR), UMR6290, Université de Rennes, CNRS, Rennes, France.
| | - Pierre Genevaux
- Laboratoire de Microbiologie et de Génétique Moléculaires, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France.
| |
Collapse
|
5
|
Hernandez-Beeftink T, Marcelino-Rodríguez I, Guillen-Guio B, Rodríguez-Pérez H, Lorenzo-Salazar JM, Corrales A, Díaz-de Usera A, González-Montelongo R, Domínguez D, Espinosa E, Villar J, Flores C. Admixture Mapping of Sepsis in European Individuals With African Ancestries. Front Med (Lausanne) 2022; 9:754440. [PMID: 35345767 PMCID: PMC8957104 DOI: 10.3389/fmed.2022.754440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 01/24/2022] [Indexed: 11/30/2022] Open
Abstract
Sepsis is a severe systemic inflammatory response to infections that is accompanied by organ dysfunction. Although the ancestral genetic background is a relevant factor for sepsis susceptibility, there is a lack of studies using the genetic singularities of a recently admixed population to identify loci involved in sepsis susceptibility. Here we aimed to discover new sepsis loci by completing the first admixture mapping study of sepsis in Canary Islanders, leveraging their distinctive genetic makeup as a mixture of Europeans and African ancestries. We used a case-control approach and inferred local ancestry blocks from genome-wide data from 113,414 polymorphisms genotyped in 343 patients with sepsis and 410 unrelated controls, all ascertained for grandparental origin in the Canary Islands (Spain). Deviations in local ancestries between cases and controls were tested using logistic regressions, followed by fine-mapping analyses based on imputed genotypes, in silico functional assessments, and gene expression analysis centered on the region of interest. The admixture mapping analysis detected that local European ancestry in a locus spanning 1.2 megabases of chromosome 8p23.1 was associated with sepsis (lowest p = 1.37 × 10−4; Odds Ratio [OR] = 0.51; 95%CI = 0.40–0.66). Fine-mapping studies prioritized the variant rs13249564 within intron 1 of MFHAS1 gene associated with sepsis (p = 9.94 × 10−4; OR = 0.65; 95%CI = 0.50–0.84). Functional and gene expression analyses focused on 8p23.1 allowed us to identify alternative genes with possible biological plausibility such as defensins, which are well-known effector molecules of innate immunity. By completing the first admixture mapping study of sepsis, our results revealed a new genetic locus (8p23.1) harboring a number of genes with plausible implications in sepsis susceptibility.
Collapse
Affiliation(s)
- Tamara Hernandez-Beeftink
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.,Research Unit, Hospital Universitario de Gran Canaria Dr. Negrín, Las Palmas de Gran Canaria, Spain
| | - Itahisa Marcelino-Rodríguez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Beatriz Guillen-Guio
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Héctor Rodríguez-Pérez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Jose M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Almudena Corrales
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Ana Díaz-de Usera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | | | - David Domínguez
- Department of Anesthesiology, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
| | - Elena Espinosa
- Department of Anesthesiology, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
| | - Jesús Villar
- Research Unit, Hospital Universitario de Gran Canaria Dr. Negrín, Las Palmas de Gran Canaria, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
6
|
Ge F, Zhang Y, Xu J, Muhammad A, Song J, Yu DJ. Prediction of disease-associated nsSNPs by integrating multi-scale ResNet models with deep feature fusion. Brief Bioinform 2022; 23:bbab530. [PMID: 34953462 PMCID: PMC8769912 DOI: 10.1093/bib/bbab530] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/13/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open
Abstract
More than 6000 human diseases have been recorded to be caused by non-synonymous single nucleotide polymorphisms (nsSNPs). Rapid and accurate prediction of pathogenic nsSNPs can improve our understanding of the principle and design of new drugs, which remains an unresolved challenge. In the present work, a new computational approach, termed MSRes-MutP, is proposed based on ResNet blocks with multi-scale kernel size to predict disease-associated nsSNPs. By feeding the serial concatenation of the extracted four types of features, the performance of MSRes-MutP does not obviously improve. To address this, a second model FFMSRes-MutP is developed, which utilizes deep feature fusion strategy and multi-scale 2D-ResNet and 1D-ResNet blocks to extract relevant two-dimensional features and physicochemical properties. FFMSRes-MutP with the concatenated features achieves a better performance than that with individual features. The performance of FFMSRes-MutP is benchmarked on five different datasets. It achieves the Matthew's correlation coefficient (MCC) of 0.593 and 0.618 on the PredictSNP and MMP datasets, which are 0.101 and 0.210 higher than that of the existing best method PredictSNP1. When tested on the HumDiv and HumVar datasets, it achieves MCC of 0.9605 and 0.9507, and area under curve (AUC) of 0.9796 and 0.9748, which are 0.1747 and 0.2669, 0.0853 and 0.1335, respectively, higher than the existing best methods PolyPhen-2 and FATHMM (weighted). In addition, on blind test using a third-party dataset, FFMSRes-MutP performs as the second-best predictor (with MCC and AUC of 0.5215 and 0.7633, respectively), when compared with the other four predictors. Extensive benchmarking experiments demonstrate that FFMSRes-MutP achieves effective feature fusion and can be explored as a useful approach for predicting disease-associated nsSNPs. The webserver is freely available at http://csbio.njust.edu.cn/bioinf/ffmsresmutp/ for academic use.
Collapse
Affiliation(s)
- Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jian Xu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Arif Muhammad
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
7
|
Suarez-Pajes E, Díaz-García C, Rodríguez-Pérez H, Lorenzo-Salazar JM, Marcelino-Rodríguez I, Corrales A, Zheng X, Callero A, Perez-Rodriguez E, Garcia-Robaina JC, González-Montelongo R, Flores C, Guillen-Guio B. Targeted analysis of genomic regions enriched in African ancestry reveals novel classical HLA alleles associated with asthma in Southwestern Europeans. Sci Rep 2021; 11:23686. [PMID: 34880287 PMCID: PMC8654850 DOI: 10.1038/s41598-021-02893-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/24/2021] [Indexed: 12/30/2022] Open
Abstract
Despite asthma has a considerable genetic component, an important proportion of genetic risks remain unknown, especially for non-European populations. Canary Islanders have the largest African genetic ancestry observed among Southwestern Europeans and the highest asthma prevalence in Spain. Here we examined broad chromosomal regions previously associated with an excess of African genetic ancestry in Canary Islanders, with the aim of identifying novel risk variants associated with asthma susceptibility. In a two-stage cases-control study, we revealed a variant within HLA-DQB1 significantly associated with asthma risk (rs1049213, meta-analysis p = 1.30 × 10-7, OR [95% CI] = 1.74 [1.41-2.13]) previously associated with asthma and broad allergic phenotype. Subsequent fine-mapping analyses of classical HLA alleles revealed a novel allele significantly associated with asthma protection (HLA-DQA1*01:02, meta-analysis p = 3.98 × 10-4, OR [95% CI] = 0.64 [0.50-0.82]) that had been linked to infectious and autoimmune diseases, and peanut allergy. HLA haplotype analyses revealed a novel haplotype DQA1*01:02-DQB1*06:04 conferring asthma protection (meta-analysis p = 4.71 × 10-4, OR [95% CI] = 0.47 [0.29- 0.73]).
Collapse
Affiliation(s)
- Eva Suarez-Pajes
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Claudio Díaz-García
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Héctor Rodríguez-Pérez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Jose M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico Y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Itahisa Marcelino-Rodríguez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Almudena Corrales
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Xiuwen Zheng
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ariel Callero
- Allergy Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | - Eva Perez-Rodriguez
- Allergy Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | - Jose C Garcia-Robaina
- Allergy Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | | | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.
- Genomics Division, Instituto Tecnológico Y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain.
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain.
| | - Beatriz Guillen-Guio
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.
- Department of Health Sciences, University of Leicester, Leicester, UK.
| |
Collapse
|
8
|
Wang Y, Jiang Y, Yao B, Huang K, Liu Y, Wang Y, Qin X, Saykin AJ, Chen L. WEVar: a novel statistical learning framework for predicting noncoding regulatory variants. Brief Bioinform 2021; 22:6279833. [PMID: 34021560 DOI: 10.1093/bib/bbab189] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/05/2021] [Accepted: 04/23/2021] [Indexed: 11/15/2022] Open
Abstract
Understanding the functional consequence of noncoding variants is of great interest. Though genome-wide association studies or quantitative trait locus analyses have identified variants associated with traits or molecular phenotypes, most of them are located in the noncoding regions, making the identification of causal variants a particular challenge. Existing computational approaches developed for prioritizing noncoding variants produce inconsistent and even conflicting results. To address these challenges, we propose a novel statistical learning framework, which directly integrates the precomputed functional scores from representative scoring methods. It will maximize the usage of integrated methods by automatically learning the relative contribution of each method and produce an ensemble score as the final prediction. The framework consists of two modes. The first 'context-free' mode is trained using curated causal regulatory variants from a wide range of context and is applicable to predict regulatory variants of unknown and diverse context. The second 'context-dependent' mode further improves the prediction when the training and testing variants are from the same context. By evaluating the framework via both simulation and empirical studies, we demonstrate that it outperforms integrated scoring methods and the ensemble score successfully prioritizes experimentally validated regulatory variants in multiple risk loci.
Collapse
Affiliation(s)
- Ye Wang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Yuchao Jiang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Bing Yao
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Kun Huang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Yunlong Liu
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yue Wang
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Xiao Qin
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Li Chen
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| |
Collapse
|
9
|
Hoffman GE, Bendl J, Girdhar K, Schadt EE, Roussos P. Functional interpretation of genetic variants using deep learning predicts impact on chromatin accessibility and histone modification. Nucleic Acids Res 2019; 47:10597-10611. [PMID: 31544924 PMCID: PMC6847046 DOI: 10.1093/nar/gkz808] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 08/28/2019] [Accepted: 09/12/2019] [Indexed: 12/19/2022] Open
Abstract
Identifying functional variants underlying disease risk and adoption of personalized medicine are currently limited by the challenge of interpreting the functional consequences of genetic variants. Predicting the functional effects of disease-associated protein-coding variants is increasingly routine. Yet, the vast majority of risk variants are non-coding, and predicting the functional consequence and prioritizing variants for functional validation remains a major challenge. Here, we develop a deep learning model to accurately predict locus-specific signals from four epigenetic assays using only DNA sequence as input. Given the predicted epigenetic signal from DNA sequence for the reference and alternative alleles at a given locus, we generate a score of the predicted epigenetic consequences for 438 million variants observed in previous sequencing projects. These impact scores are assay-specific, are predictive of allele-specific transcription factor binding and are enriched for variants associated with gene expression and disease risk. Nucleotide-level functional consequence scores for non-coding variants can refine the mechanism of known functional variants, identify novel risk variants and prioritize downstream experiments.
Collapse
Affiliation(s)
- Gabriel E Hoffman
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jaroslav Bendl
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kiran Girdhar
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Panos Roussos
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research, Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| |
Collapse
|
10
|
Yang H, Chen R, Wang Q, Wei Q, Ji Y, Zheng G, Zhong X, Cox NJ, Li B. De novo pattern discovery enables robust assessment of functional consequences of non-coding variants. Bioinformatics 2019; 35:1453-1460. [PMID: 30256891 PMCID: PMC6499232 DOI: 10.1093/bioinformatics/bty826] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 08/17/2018] [Accepted: 09/25/2018] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Given the complexity of genome regions, prioritize the functional effects of non-coding variants remains a challenge. Although several frameworks have been proposed for the evaluation of the functionality of non-coding variants, most of them used 'black boxes' methods that simplify the task as the pathogenicity/benign classification problem, which ignores the distinct regulatory mechanisms of variants and leads to less desirable performance. In this study, we developed DVAR, an unsupervised framework that leverage various biochemical and evolutionary evidence to distinguish the gene regulatory categories of variants and assess their comprehensive functional impact simultaneously. RESULTS DVAR performed de novo pattern discovery in high-dimensional data and identified five regulatory clusters of non-coding variants. Leveraging the new insights into the multiple functional patterns, it measures both the between-class and the within-class functional implication of the variants to achieve accurate prioritization. Compared to other two-class learning methods, it showed improved performance in identification of clinically significant variants, fine-mapped GWAS variants, eQTLs and expression-modulating variants. Moreover, it has superior performance on disease causal variants verified by genome-editing (like CRISPR-Cas9), which could provide a pre-selection strategy for genome-editing technologies across the whole genome. Finally, evaluated in BioVU and UK Biobank, two large-scale DNA biobanks linked to complete electronic health records, DVAR demonstrated its effectiveness in prioritizing non-coding variants associated with medical phenotypes. AVAILABILITY AND IMPLEMENTATION The C++ and Python source codes, the pre-computed DVAR-cluster labels and DVAR-scores across the whole genome are available at https://www.vumc.org/cgg/dvar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hai Yang
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Rui Chen
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Quan Wang
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Qiang Wei
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Ying Ji
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Guangze Zheng
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Xue Zhong
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|