1
|
Dwivedi A, Chauhan L, Kumar P, Nanda A, Jayakrishnan VY. Novel WAC gene variant identified in the first documented case of DeSanto-Shinawi Syndrome in India. Mol Cell Pediatr 2025; 12:7. [PMID: 40347397 PMCID: PMC12065696 DOI: 10.1186/s40348-025-00193-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 03/27/2025] [Indexed: 05/12/2025] Open
Abstract
BACKGROUND DeSanto-Shinawi Syndrome (DESSH) is a rare neurodevelopmental disorder characterized by intellectual disability, behavioral abnormalities, and distinctive dysmorphic features, linked to likely pathogenic/pathogenic variants in the WAC gene. We report the first documented case of DESSH in India, identified in a 3-year-old male presenting with global developmental delay and coarse facies. RESULTS Exome sequencing revealed a novel heterozygous nonsense likely pathogenic variant (c.1661 C>A(p.Ser554*)) in the WAC gene, expanding the genotypic spectrum associated with this condition. We employed computational methodologies to understand the effects of this novel variant on protein structure and function. In-silico prediction score suggested protein truncation due to the c.1661 C>A (p.Ser554*) variation in the WAC gene, expected to result in a loss of normal protein function. CONCLUSION The findings advocate for increased awareness and genetic testing in atypical cases to facilitate accurate diagnosis and management. This case underscores the importance of considering DESSH in the differential diagnosis of similar neurodevelopmental disorders and enhances our understanding of the genetic diversity within the WAC gene.
Collapse
Affiliation(s)
- Aradhana Dwivedi
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India
| | - Lakshita Chauhan
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India
| | - Pramod Kumar
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India.
| | - Aashna Nanda
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India
| | | |
Collapse
|
2
|
Zhou K, Gheybi K, Soh PXY, Hayes VM. Evaluating variant pathogenicity prediction tools to establish African inclusive guidelines for germline genetic testing. COMMUNICATIONS MEDICINE 2025; 5:157. [PMID: 40328947 PMCID: PMC12056225 DOI: 10.1038/s43856-025-00883-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 04/24/2025] [Indexed: 05/08/2025] Open
Abstract
BACKGROUND Genetic germline testing is restricted for African patients. Lack of ancestrally relevant genomic data perpetuated by African diversity has resulted in European-biased curated clinical variant databases and pathogenic prediction guidelines. While numerous variant pathogenicity prediction tools (VPPTs) exist, their performance has yet to be established within the context of African diversity. METHODS To address this limitation, we assessed 54 VPPTs for predictive performance (sensitivity, specificity, false positive and negative rates) across 145,291 known pathogenic or benign variants derived from 50 Southern African and 50 European men matched for advanced prostate cancer. Prioritising VPPTs for optimal ancestral performance, we screened 5.3 million variants of unknown significance for predicted functional and oncogenic potential. RESULTS We observe a 2.1- and 4.1-fold increase in the number of known and predicted rare pathogenic or benign variants, respectively, against a 1.6-fold decrease in the number of available interrogated variants in our European over African data. Although sensitivity was significantly lower for our African data overall (0.66 vs 0.71, p = 9.86E-06), MetaSVM, CADD, Eigen-raw, BayesDel-noAF, phyloP100way-vertebrate and MVP outperformed irrespective of ancestry. Conversely, MutationTaster, DANN, LRT and GERP-RS were African-specific top performers, while MutationAssessor, PROVEAN, LIST-S2 and REVEL are European-specific. Using these pathogenic prediction workflows, we narrow the ancestral gap for potentially deleterious and oncogenic variant prediction in favour of our African data by 1.15- and 1.1-fold, respectively. CONCLUSION Although VPPT sensitivity favours European data, our findings provide guidelines for VPPT selection to maximise rare pathogenic variant prediction for African disease studies.
Collapse
Affiliation(s)
- Kangping Zhou
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Kazzem Gheybi
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Pamela X Y Soh
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Vanessa M Hayes
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia.
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK.
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa.
| |
Collapse
|
3
|
Lucas MC, Keßler T, Scharf F, Steinke-Lange V, Klink B, Laner A, Holinski-Feder E. A series of reviews in familial cancer: genetic cancer risk in context variants of uncertain significance in MMR genes: which procedures should be followed? Fam Cancer 2025; 24:42. [PMID: 40317406 DOI: 10.1007/s10689-025-00470-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 04/18/2025] [Indexed: 05/07/2025]
Abstract
Interpreting variants of uncertain significance (VUS) in mismatch repair (MMR) genes remains a major challenge in managing Lynch syndrome and other hereditary cancer syndromes. This review outlines recommended VUS classification procedures, encompassing foundational and specialized methodologies tailored for MMR genes by expert organizations, including InSiGHT and ClinGen's Hereditary Colorectal Cancer/Polyposis Variant Curation Expert Panel (VCEP). Key approaches include: (1) functional data, encompassing direct assays measuring MMR proficiency such as in vitro MMR assays, deep mutational scanning, and MMR cell-based assays, as well as techniques like methylation-tolerant assays, proteomic-based approaches, and RNA sequencing, all of which provide critical functional evidence supporting variant pathogenicity; (2) computational data/tools, including in silico meta-predictors and models, which contribute to robust VUS classification when integrated with experimental evidence; and (3) enhanced variant detection to identify the actual causal variant through whole-genome sequencing and long-read sequencing to detect pathogenic variants missed by traditional methods. These strategies improve diagnostic precision, support clinical decision-making for Lynch syndrome, and establish a flexible framework that can be applied to other OMIM-listed genes.
Collapse
Affiliation(s)
- Morghan C Lucas
- MGZ- Medical Genetics Center, Munich, Germany.
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany.
| | | | | | - Verena Steinke-Lange
- MGZ- Medical Genetics Center, Munich, Germany
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| | - Barbara Klink
- MGZ- Medical Genetics Center, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| | | | - Elke Holinski-Feder
- MGZ- Medical Genetics Center, Munich, Germany
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| |
Collapse
|
4
|
Radjasandirane R, Diharce J, Gelly JC, de Brevern AG. Insights for variant clinical interpretation based on a benchmark of 65 variant effect predictors. Genomics 2025; 117:111036. [PMID: 40127826 DOI: 10.1016/j.ygeno.2025.111036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 02/20/2025] [Accepted: 03/20/2025] [Indexed: 03/26/2025]
Abstract
Single amino acid substitutions in protein sequences are generally harmless, but a certain number of these changes can lead to disease. Accurately predicting the effect of genetic variants is crucial for clinicians as it accelerates the diagnosis of patients with missense variants associated with health problems. Many computational tools have been developed to predict the pathogenicity of genetic variants with various approaches. Analysing the performance of these different computational tools is crucial to provide guidance to both future users and especially clinicians. In this study, a large-scale investigation of 65 tools was conducted. Variants from both clinical and functional contexts were used, incorporating data from the ClinVar database and bibliographic sources. The analysis showed that AlphaMissense often performed very well and was in fact one of the best options among the existing tools. In addition, as expected, meta-predictors perform well on average. Tools using evolutionary information showed the best performance for functional variants. These results also highlighted some heterogeneity in the difficulty of predicting some specific variants while others are always well categorized. Strikingly, the majority of variants from the ClinVar database appear to be easy to predict, while variants from other sources of data are more challenging. This raises questions about the use of ClinVar and the dataset used to validate tools accuracy. In addition, these results show that this variant predictability can be divided into three distinct classes: easy, moderate and hard to predict. We analyzed the parameters leading to these differences and showed that the classes are related to structural and functional information.
Collapse
Affiliation(s)
- Ragousandirane Radjasandirane
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Julien Diharce
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France.
| |
Collapse
|
5
|
Pilalis E, Zisis D, Andrinopoulou C, Karamanidou T, Antonara M, Stavropoulos TG, Chatziioannou A. Genome-wide functional annotation of variants: a systematic review of state-of-the-art tools, techniques and resources. Front Pharmacol 2025; 16:1474026. [PMID: 40098614 PMCID: PMC11911558 DOI: 10.3389/fphar.2025.1474026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 02/03/2025] [Indexed: 03/19/2025] Open
Abstract
The recent advancement of sequencing technologies marks a significant shift in the character and complexity of the digital genomic data universe, encompassing diverse types of molecular data, screened through manifold technological platforms. As a result, a plethora of fully assembled genomes are generated that span vertically the evolutionary scale. Notwithstanding the tsunami of thriving innovations that accomplish unprecedented, nucleotide-level, structural and functional annotation, an exhaustive, systemic, massive genome-wide functional annotation remains elusive, particularly when the criterion is automation and efficiency in data-agnostic interpretation. The latter is of paramount importance for the elaboration of strategies for sophisticated, data-driven genome-wide annotation, which aim to impart a sustainable and comprehensive systemic approach to addressing whole genome variation. Therefore, it is essential to develop methods and tools that promote systematic functional genomic annotation, with emphasis on mechanistic information exceeding the limits of coding regions, and exploiting the chunks of pertinent information residing in non-coding regions, including promoter and enhancer sequences, non-coding RNAs, DNA methylation sites, transcription factor binding sites, transposable elements and more. This review provides an overview of the current state-of-the-art in genome-wide functional annotation of genetic variation, including existing bioinformatic tools, resources, databases and platforms currently available or reported in the literature. Particular emphasis is placed on the functional annotation of variants that lie outside protein-coding genomic regions (intronic or intergenic), their potential co-localization with regulatory element areas, such as putative non-coding RNA regions, and the assessment of their functional impact on the investigated phenotype. In addition, state-of-the-art tools that leverage data obtained from WGS and GWAS-based analyses are discussed, along with future bioinformatics directions and developments. These future directions emphasize efficient, comprehensive, and largely automated functional annotation of both coding and non-coding genomic variants, as well as their optimal evaluation.
Collapse
Affiliation(s)
| | | | | | | | - Maria Antonara
- Pfizer Center for Digital Innovation, Thessaloniki, Greece
| | | | - Aristotelis Chatziioannou
- e-NIOS Applications PC, Kallithea, Greece
- Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| |
Collapse
|
6
|
Rastogi R, Chung R, Li S, Li C, Lee K, Woo J, Kim DW, Keum C, Babbi G, Martelli PL, Savojardo C, Casadio R, Chennen K, Weber T, Poch O, Ancien F, Cia G, Pucci F, Raimondi D, Vranken W, Rooman M, Marquet C, Olenyi T, Rost B, Andreoletti G, Kamandula A, Peng Y, Bakolitsa C, Mort M, Cooper DN, Bergquist T, Pejaver V, Liu X, Radivojac P, Brenner SE, Ioannidis NM. Critical assessment of missense variant effect predictors on disease-relevant variant data. Hum Genet 2025; 144:281-293. [PMID: 40113603 PMCID: PMC11976771 DOI: 10.1007/s00439-025-02732-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 02/07/2025] [Indexed: 03/22/2025]
Abstract
Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.
Collapse
Affiliation(s)
- Ruchir Rastogi
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
| | - Ryan Chung
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Sindy Li
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Chang Li
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | | | | | | | | | - Giulia Babbi
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | | | | | - François Ancien
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Gabriel Cia
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Daniele Raimondi
- ESAT-STADIUS, KU Leuven, Leuven, Belgium
- Institut de Génétique Moléculaire de Montpellier, Université de Montpellier, Montpellier, France
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Céline Marquet
- Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Munich, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Munich, Germany
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Sage Bionetworks, Seattle, WA, USA
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - Timothy Bergquist
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Steven E Brenner
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA.
| | - Nilah M Ioannidis
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
7
|
Zhao W, Tao Y, Xiong J, Liu L, Wang Z, Shao C, Shang L, Hu Y, Xu Y, Su Y, Yu J, Feng T, Xie J, Xu H, Zhang Z, Peng J, Wu J, Zhang Y, Zhu S, Xia K, Tang B, Zhao G, Li J, Li B. GoFCards: an integrated database and analytic platform for gain of function variants in humans. Nucleic Acids Res 2025; 53:D976-D988. [PMID: 39578693 PMCID: PMC11701611 DOI: 10.1093/nar/gkae1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/20/2024] [Accepted: 10/28/2024] [Indexed: 11/24/2024] Open
Abstract
Gain-of-function (GOF) variants, which introduce new or amplify protein functions, are essential for understanding disease mechanisms. Despite advances in genomics and functional research, identifying and analyzing pathogenic GOF variants remains challenging owing to fragmented data and database limitations, underscoring the difficulty in accessing critical genetic information. To address this challenge, we manually reviewed the literature, pinpointing 3089 single-nucleotide variants and 72 insertions and deletions in 579 genes associated with 1299 diseases from 2069 studies, and integrated these with the 3.5 million predicted GOF variants. Our approach is complemented by a proprietary scoring system that prioritizes GOF variants on the basis of the evidence supporting their GOF effects and provides predictive scores for variants that lack existing documentation. We then developed a database named GoFCards for general geneticists and clinicians to easily obtain GOF variants in humans (http://www.genemed.tech/gofcards). This database also contains data from >150 sources and offers comprehensive variant-level and gene-level annotations, with the aim of providing users with convenient access to detailed and relevant genetic information. Furthermore, GoFCards empowers users with limited bioinformatic skills to analyze and annotate genetic data, and prioritize GOF variants. GoFCards offers an efficient platform for interpreting GOF variants and thereby advancing genetic research.
Collapse
Affiliation(s)
- Wenjing Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Medical Genetics, NHC Key Laboratory of Healthy Birth and Birth Defect Prevention in Western China, The First People's Hospital of Yunnan Province, No. 157 Jinbi Road, Xishan District, Kunming, Yunnan 650000, China
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Youfu Tao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Xiong
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Lei Liu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zhongqing Wang
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Chuhan Shao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Ling Shang
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yue Hu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yishu Xu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yingluo Su
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiahui Yu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Tianyi Feng
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Junyi Xie
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Huijuan Xu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zijun Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Peng
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jianbin Wu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yuchang Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Shaobo Zhu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Kun Xia
- MOE Key Laboratory of Pediatric Rare Diseases & Hunan Key Laboratory of Medical Genetics, Central South University, No. 110 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Neurology & Multi-omics Research Center for Brain Disorders, The First Affiliated Hospital University of South China, 69 Chuan Shan Road, Shi Gu District, Hengyang, Hunan 421000, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| |
Collapse
|
8
|
Katsonis P, Lichtarge O. Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects. Nat Commun 2025; 16:159. [PMID: 39746940 PMCID: PMC11696468 DOI: 10.1038/s41467-024-55066-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 11/27/2024] [Indexed: 01/04/2025] Open
Abstract
Computational methods for estimating missense variant impact suffer from inconsistent performance across genes, which poses a major challenge for their reliable use in clinical practice. While ensemble scores leverage multiple prediction methods to enhance consistency, the overrepresentation of certain genes in the training data can bias their outcomes. To address this critical limitation, we propose a gene-specific ensemble framework trained on reference computational annotations rather than on clinical or experimental data. Accordingly, we generate Meta-EA ensemble scores that achieve comparable performance to the top individual predicting method for each gene set. Incorporating the effects of splicing and the allele frequency of human polymorphisms further enhances the performance of Meta-EA, achieving an area under the receiver operating characteristic curve of 0.97 for both gene-balanced and imbalanced clinical assessments. In conclusion, this work leverages the wealth of existing variant impact prediction approaches to generate improved estimations for clinical interpretation.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
9
|
Wang X, Zhang M, Yang X, Yu DJ, Ge F. GPTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in G Protein-Coupled Receptors. J Chem Inf Model 2024; 64:9626-9642. [PMID: 39610143 DOI: 10.1021/acs.jcim.4c01999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction. A key aspect of GPTrans's effectiveness is our introduction of an innovative deep feature integration strategy, which merges embeddings and class tokens from multiple protein language models, including evolutionary scale modeling and ProtTrans, thus shedding light on the biochemical properties of proteins. Leveraging transformer components and a self-attention mechanism, GPTrans captures higher-level representations of protein features. Employing both wildtype and mutation site information for feature fusion not only enriches the predictive feature set but also avoids the common issue of overestimation associated with sequence-based predictions. This approach distinguishes GPTrans, enabling it to significantly outperform existing methods. Our evaluations across diverse GPCR data sets, including ClinVar and MutHTP, demonstrate GPTrans's superior performance, with average AUC values of 0.874 and 0.590 in 10-fold cross-validation. Notably, compared to the AlphaMissense method, GPTrans exhibited a remarkable 38.03% improvement in accuracy when predicting disease-associated mutations in the MutHTP data set. A thorough analysis of the predicted results further validates the model's effectiveness. The source code, data sets, and prediction results for GPTrans are available for academic use at https://github.com/EduardWang/GPTrans.
Collapse
Affiliation(s)
- Xiaohua Wang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Xibei Yang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
| |
Collapse
|
10
|
Huang K, Zeng T, Koc S, Pettet A, Zhou J, Jain M, Sun D, Ruiz C, Ren H, Howe L, Richardson TG, Cortes A, Aiello K, Branson K, Pfenning A, Engreitz JM, Zhang MJ, Leskovec J. Small-cohort GWAS discovery with AI over massive functional genomics knowledge graph. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.12.03.24318375. [PMID: 39677475 PMCID: PMC11643201 DOI: 10.1101/2024.12.03.24318375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Genome-wide association studies (GWASs) have identified tens of thousands of disease associated variants and provided critical insights into developing effective treatments. However, limited sample sizes have hindered the discovery of variants for uncommon and rare diseases. Here, we introduce KGWAS, a novel geometric deep learning method that leverages a massive functional knowledge graph across variants and genes to improve detection power in small-cohort GWASs significantly. KGWAS assesses the strength of a variant's association to disease based on the aggregate GWAS evidence across molecular elements interacting with the variant within the knowledge graph. Comprehensive simulations and replication experiments showed that, for small sample sizes ( N =1-10K), KGWAS identified up to 100% more statistically significant associations than state-of-the-art GWAS methods and achieved the same statistical power with up to 2.67× fewer samples. We applied KGWAS to 554 uncommon UK Biobank diseases ( N case <5K) and identified 183 more associations (46.9% improvement) than the original GWAS, where the gain further increases to 79.8% for 141 rare diseases (N case <300). The KGWAS-only discoveries are supported by abundant functional evidence, such as rs2155219 (on 11q13) associated with ulcerative colitis potentially via regulating LRRC32 expression in CD4+ regulatory T cells, and rs7312765 (on 12q12) associated with the rare disease myasthenia gravis potentially via regulating PPHLN1 expression in neuron-related cell types. Furthermore, KGWAS consistently improves downstream analyses such as identifying disease-specific network links for interpreting GWAS variants, identifying disease-associated genes, and identifying disease-relevant cell populations. Overall, KGWAS is a flexible and powerful AI model that integrates growing functional genomics data to discover novel variants, genes, cells, and networks, especially valuable for small cohort diseases.
Collapse
|
11
|
Jayasinghe D, Eshetie S, Beckmann K, Benyamin B, Lee SH. Advancements and limitations in polygenic risk score methods for genomic prediction: a scoping review. Hum Genet 2024; 143:1401-1431. [PMID: 39542907 DOI: 10.1007/s00439-024-02716-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 10/31/2024] [Indexed: 11/17/2024]
Abstract
This scoping review aims to identify and evaluate the landscape of Polygenic Risk Score (PRS)-based methods for genomic prediction from 2013 to 2023, highlighting their advancements, key concepts, and existing gaps in knowledge, research, and technology. Over the past decade, various PRS-based methods have emerged, each employing different statistical frameworks aimed at enhancing prediction accuracy, processing speed and memory efficiency. Despite notable advancements, challenges persist, including unrealistic assumptions regarding sample sizes and the polygenicity of traits necessary for accurate predictions, as well as limitations in exploring hyper-parameter spaces and considering environmental interactions. We included studies focusing on PRS-based methods for risk prediction that underwent methodological evaluations using valid approaches and released computational tools/software. Additionally, we restricted our selection to studies involving human participants that were published in English language. This review followed the standard protocol recommended by Joanna Briggs Institute Reviewer's Manual, systematically searching Ovid MEDLINE, Ovid Embase, Scopus and Web of Science databases. Additionally, searches included grey literature sources like pre-print servers such as bioRxiv, and articles recommended by experts to ensure comprehensive and diverse coverage of relevant records. This study identified 34 studies detailing 37 genomic prediction methods, the majority of which rely on linkage disequilibrium (LD) information and necessitate hyper-parameter tuning. Nine methods integrate functional/gene annotation, while 12 are suitable for cross-ancestry genomic prediction, with only one considering gene-environment (GxE) interaction. While some methods require individual-level data, most leverage summary statistics, offering flexibility. Despite progress, challenges remain. These include computational complexity and the need for large sample sizes for high prediction accuracy. Furthermore, recent methods exhibit varying effectiveness across traits, with absolute accuracies often falling short of clinical utility. Transferability across ancestries varies, influenced by trait heritability and diversity of training data, while handling admixed populations remains challenging. Additionally, the absence of standard error measurements for individual PRSs, crucial in clinical settings, underscores a critical gap. Another issue is the lack of customizable graphical visualization tools among current software packages. While genomic prediction methods have advanced significantly, there is still room for improvement. Addressing current challenges and embracing future research directions will lead to the development of more universally applicable, robust, and clinically relevant genomic prediction tools.
Collapse
Affiliation(s)
- Dovini Jayasinghe
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Setegn Eshetie
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
- College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Kerri Beckmann
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| |
Collapse
|
12
|
Lu S, Liu K, Wang D, Ye Y, Jiang Z, Gao Y. Genomic structural variants analysis in leukemia by a novel cytogenetic technique: Optical genome mapping. Cancer Sci 2024; 115:3543-3551. [PMID: 39180374 PMCID: PMC11531954 DOI: 10.1111/cas.16325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/05/2024] [Accepted: 08/12/2024] [Indexed: 08/26/2024] Open
Abstract
Genomic structural variants (SVs) play a pivotal role in driving the evolution of hematologic malignancies, particularly in leukemia, in which genetic abnormalities are crucial features. Detecting SVs is essential for achieving precise diagnosis and prognosis in these cases. Karyotyping, often complemented by fluorescence in situ hybridization and/or chromosomal microarray analysis, provides standard diagnostic outcomes for various types of SVs in front-line testing for leukemia. Recently, optical genome mapping (OGM) has emerged as a promising technique due to its ability to detect all SVs identified by other cytogenetic methods within one single assay. Furthermore, OGM has revealed additional clinically significant SVs in various clinical laboratories, underscoring its considerable potential for enhancing front-line testing in cases of leukemia. This review aims to elucidate the principles of conventional cytogenetic techniques and OGM, with a focus on the technical performance of OGM and its applications in diagnosing and prognosticating myelodysplastic syndromes, acute myeloid leukemia, acute lymphoblastic leukemia, and chronic lymphocytic leukemia.
Collapse
Affiliation(s)
- Song Lu
- Center for Advanced Measurement ScienceNational Institute of MetrologyBeijingChina
| | - Kefu Liu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life SciencesCentral South UniversityChangshaHunanChina
| | - Di Wang
- Center for Advanced Measurement ScienceNational Institute of MetrologyBeijingChina
| | - Yuan Ye
- College of Life Science and Technology, Huazhong University of Science and TechnologyWuhanChina
| | - Zhiping Jiang
- Department of Hematology, Xiangya HospitalCentral South UniversityChangshaHunanChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaHunanChina
- Hunan Hematology Oncology Clinical Medical Research CenterChangshaHunanChina
| | - Yunhua Gao
- Center for Advanced Measurement ScienceNational Institute of MetrologyBeijingChina
| |
Collapse
|
13
|
Ahmad RM, Ali BR, Al-Jasmi F, Al Dhaheri N, Al Turki S, Kizhakkedath P, Mohamad MS. AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes. Hum Genomics 2024; 18:99. [PMID: 39256852 PMCID: PMC11389290 DOI: 10.1186/s40246-024-00667-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 08/22/2024] [Indexed: 09/12/2024] Open
Abstract
Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.
Collapse
Affiliation(s)
- Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Bassam R Ali
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Fatma Al-Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Noura Al Dhaheri
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Saeed Al Turki
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Praseetha Kizhakkedath
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates.
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.
| |
Collapse
|
14
|
Liu J, Chen Y, Huang K, Guan X. Enhancing Missense Variant Pathogenicity Prediction with MissenseNet: Integrating Structural Insights and ShuffleNet-Based Deep Learning Techniques. Biomolecules 2024; 14:1105. [PMID: 39334871 PMCID: PMC11429773 DOI: 10.3390/biom14091105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/17/2024] [Accepted: 07/22/2024] [Indexed: 09/30/2024] Open
Abstract
The classification of missense variant pathogenicity continues to pose significant challenges in human genetics, necessitating precise predictions of functional impacts for effective disease diagnosis and personalized treatment strategies. Traditional methods, often compromised by suboptimal feature selection and limited generalizability, are outpaced by the enhanced classification model, MissenseNet (Missense Classification Network). This model, advancing beyond standard predictive features, incorporates structural insights from AlphaFold2 protein predictions, thus optimizing structural data utilization. MissenseNet, built on the ShuffleNet architecture, incorporates an encoder-decoder framework and a Squeeze-and-Excitation (SE) module designed to adaptively adjust channel weights and enhance feature fusion and interaction. The model's efficacy in classifying pathogenicity has been validated through superior accuracy compared to conventional methods and by achieving the highest areas under the Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves (Area Under the Curve and Area Under the Precision-Recall Curve) in an independent test set, thus underscoring its superiority.
Collapse
Affiliation(s)
- Jing Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Yingying Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kai Huang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
- National Grain Industry (Urban Grain and Oil Security) Technology Innovation Center, Shanghai 200093, China
| | - Xiao Guan
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
- National Grain Industry (Urban Grain and Oil Security) Technology Innovation Center, Shanghai 200093, China
| |
Collapse
|
15
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
16
|
Abu-Amara H, Zhao W, Li Z, Leung YY, Schellenberg GD, Wang LS, Moorjani P, Dey AB, Dey S, Zhou X, Gross AL, Lee J, Kardia SLR, Smith JA. Region-based analysis with functional annotation identifies genes associated with cognitive function in South Asians from India. RESEARCH SQUARE 2024:rs.3.rs-4712660. [PMID: 39149469 PMCID: PMC11326367 DOI: 10.21203/rs.3.rs-4712660/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
The prevalence of dementia among South Asians across India is approximately 7.4% in those 60 years and older, yet little is known about genetic risk factors for dementia in this population. Most known risk loci for Alzheimer's disease (AD) have been identified from studies conducted in European Ancestry (EA) but are unknown in South Asians. Using whole-genome sequence data from 2680 participants from the Diagnostic Assessment of Dementia for the Longitudinal Aging Study of India (LASI-DAD), we performed a gene-based analysis of 84 genes previously associated with AD in EA. We investigated associations with the Hindi Mental State Examination (HMSE) score and factor scores for general cognitive function and five cognitive domains. For each gene, we examined missense/loss-of-function (LoF) variants and brain-specific promoter/enhancer variants, separately, both with and without incorporating additional annotation weights (e.g., deleteriousness, conservation scores) using the variant-Set Test for Association using Annotation infoRmation (STAAR). In the missense/LoF analysis without annotation weights and controlling for age, sex, state/territory, and genetic ancestry, three genes had an association with at least one measure of cognitive function (FDR q<0.1). APOE was associated with four measures of cognitive function, PICALM was associated with HMSE score, and TSPOAP1 was associated with executive function. The most strongly associated variants in each gene were rs429358 (APOE ε4), rs779406084 (PICALM), and rs9913145 (TSPOAP1). rs779406084 is a rare missense mutation that is more prevalent in LASI-DAD than in EA (minor allele frequency=0.075% vs. 0.0015%); the other two are common variants. No genes in the brain-specific promoter/enhancer analysis met criteria for significance. Results with and without annotation weights were similar. Missense/LoF variants in some genes previously associated with AD in EA are associated with measures of cognitive function in South Asians from India. Analyzing genome sequence data allows identification of potential novel causal variants enriched in South Asians.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - A B Dey
- All India Institute of Medical Sciences
| | | | | | - Alden L Gross
- Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University
| | | | | | | |
Collapse
|
17
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
18
|
Tabet DR, Kuang D, Lancaster MC, Li R, Liu K, Weile J, Coté AG, Wu Y, Hegele RA, Roden DM, Roth FP. Benchmarking computational variant effect predictors by their ability to infer human traits. Genome Biol 2024; 25:172. [PMID: 38951922 PMCID: PMC11218265 DOI: 10.1186/s13059-024-03314-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 06/17/2024] [Indexed: 07/03/2024] Open
Abstract
BACKGROUND Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts. RESULTS AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation. CONCLUSION We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
Collapse
Affiliation(s)
- Daniel R Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Da Kuang
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Roujia Li
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Karen Liu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina G Coté
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Yingzhou Wu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Robert A Hegele
- Department of Medicine, Department of Biochemistry, Schulich School of Medicine and Dentistry, Robarts Research Institute, Western University, London, ON, Canada
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Centre, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada.
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
19
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
20
|
Rastogi R, Chung R, Li S, Li C, Lee K, Woo J, Kim DW, Keum C, Babbi G, Martelli PL, Savojardo C, Casadio R, Chennen K, Weber T, Poch O, Ancien F, Cia G, Pucci F, Raimondi D, Vranken W, Rooman M, Marquet C, Olenyi T, Rost B, Andreoletti G, Kamandula A, Peng Y, Bakolitsa C, Mort M, Cooper DN, Bergquist T, Pejaver V, Liu X, Radivojac P, Brenner SE, Ioannidis NM. Critical assessment of missense variant effect predictors on disease-relevant variant data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.06.597828. [PMID: 38895200 PMCID: PMC11185644 DOI: 10.1101/2024.06.06.597828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.
Collapse
|
21
|
Zhou Y, Pirmann S, Lauschke VM. APF2: an improved ensemble method for pharmacogenomic variant effect prediction. THE PHARMACOGENOMICS JOURNAL 2024; 24:17. [PMID: 38802404 PMCID: PMC11129946 DOI: 10.1038/s41397-024-00338-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/26/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024]
Abstract
Lack of efficacy or adverse drug response are common phenomena in pharmacological therapy causing considerable morbidity and mortality. It is estimated that 20-30% of this variability in drug response stems from variations in genes encoding drug targets or factors involved in drug disposition. Leveraging such pharmacogenomic information for the preemptive identification of patients who would benefit from dose adjustments or alternative medications thus constitutes an important frontier of precision medicine. Computational methods can be used to predict the functional effects of variant of unknown significance. However, their performance on pharmacogenomic variant data has been lackluster. To overcome this limitation, we previously developed an ensemble classifier, termed APF, specifically designed for pharmacogenomic variant prediction. Here, we aimed to further improve predictions by leveraging recent key advances in the prediction of protein folding based on deep neural networks. Benchmarking of 28 variant effect predictors on 530 pharmacogenetic missense variants revealed that structural predictions using AlphaMissense were most specific, whereas APF exhibited the most balanced performance. We then developed a new tool, APF2, by optimizing algorithm parametrization of the top performing algorithms for pharmacogenomic variations and aggregating their predictions into a unified ensemble score. Importantly, APF2 provides quantitative variant effect estimates that correlate well with experimental results (R2 = 0.91, p = 0.003) and predicts the functional impact of pharmacogenomic variants with higher accuracy than previous methods, particularly for clinically relevant variations with actionable pharmacogenomic guidelines. We furthermore demonstrate better performance (92% accuracy) on an independent test set of 146 variants across 61 pharmacogenes not used for model training or validation. Application of APF2 to population-scale sequencing data from over 800,000 individuals revealed drastic ethnogeographic differences with important implications for pharmacotherapy. We thus think that APF2 holds the potential to improve the translation of genetic information into pharmacogenetic recommendations, thereby facilitating the use of Next-Generation Sequencing data for stratified medicine.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden
| | - Sebastian Pirmann
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden.
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany.
- University of Tübingen, Tübingen, Germany.
| |
Collapse
|
22
|
Ginete C, Delgadinho M, Santos B, Miranda A, Silva C, Guerreiro P, Chimusa ER, Brito M. Genetic Modifiers of Sickle Cell Anemia Phenotype in a Cohort of Angolan Children. Genes (Basel) 2024; 15:469. [PMID: 38674403 PMCID: PMC11049512 DOI: 10.3390/genes15040469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
The aim of this study was to identify genetic markers in the HBB Cluster; HBS1L-MYB intergenic region; and BCL11A, KLF1, FOX3, and ZBTB7A genes associated with the heterogeneous phenotypes of Sickle Cell Anemia (SCA) using next-generation sequencing, as well as to assess their influence and prevalence in an Angolan population. Hematological, biochemical, and clinical data were considered to determine patients' severity phenotypes. Samples from 192 patients were sequenced, and 5,019,378 variants of high quality were registered. A catalog of candidate modifier genes that clustered in pathophysiological pathways important for SCA was generated, and candidate genes associated with increasing vaso-occlusive crises (VOC) and with lower fetal hemoglobin (HbF) were identified. These data support the polygenic view of the genetic architecture of SCA phenotypic variability. Two single nucleotide polymorphisms in the intronic region of 2q16.1, harboring the BCL11A gene, are genome-wide and significantly associated with decreasing HbF. A set of variants was identified to nominally be associated with increasing VOC and are potential genetic modifiers harboring phenotypic variation among patients. To the best of our knowledge, this is the first investigation of clinical variation in SCA in Angola using a well-customized and targeted sequencing approach.
Collapse
Affiliation(s)
- Catarina Ginete
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
| | - Mariana Delgadinho
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
| | - Brígida Santos
- Centro de Investigação em Saúde de Angola (CISA), Bengo 9999, Angola;
- Hospital Pediátrico David Bernardino (HPDB), Luanda 3067, Angola
| | - Armandina Miranda
- Instituto Nacional de Saúde Doutor Ricardo Jorge (INSA), 1649-016 Lisbon, Portugal;
| | - Carina Silva
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
- Centro de Estatística e Aplicações, Universidade de Lisboa, 1649-013 Lisbon, Portugal
| | - Paulo Guerreiro
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, UK;
| | - Miguel Brito
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
- Centro de Investigação em Saúde de Angola (CISA), Bengo 9999, Angola;
| |
Collapse
|
23
|
Wei X, Li H, Zhu T, Sun Z, Sui R. Genotype-Phenotype Associations in an X-Linked Retinoschisis Patient Cohort: The Molecular Dynamic Insight and a Promising SD-OCT Indicator. Invest Ophthalmol Vis Sci 2024; 65:17. [PMID: 38324300 PMCID: PMC10854265 DOI: 10.1167/iovs.65.2.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/23/2024] [Indexed: 02/08/2024] Open
Abstract
Purpose This study investigated a three-dimensional indicator in spectral-domain optical coherence tomography (SD-OCT) and established phenotype-genotype correlation in X-linked retinoschisis (XLRS). Methods Thirty-seven patients with XLRS underwent comprehensive ophthalmic examinations, including visual acuity (VA), fundus examination, electroretinogram (ERG), and SD-OCT. SD-OCT parameters of central foveal thickness (CFT), cyst cavity volume (CCV), and photoreceptor outer segment length were assessed. CCV was defined as the sum of the areas of cyst cavities in uential B-scans, measured automatically by self-developed software (OCT-CCSEG). Structural changes of the protein associated with missense variants were quantified by molecular dynamics (MD). The correlation between genotype and phenotype was analyzed. Results Twenty-seven different RS1 variants were identified, including a novel variant c.336_337insT(p.L113Sfs*8). The average age of onset was 14.76 ± 15.75 years, and the mean VA was 0.84 ± 0.43 logMAR. The mean CCV was 1.69 ± 1.87 mm3, correlating significantly with CFT (R = 0.66; P < 0.01). In the genotype-phenotype analysis of missense variants, CCV significantly correlated with the structural effect on the protein of mutational changes referred to as wild type, including root-mean-square deviation (R = 0.34; P = 0.04), solvent accessible surface area (R = 0.38; P = 0.02), and surface hydrophobic area (R = 0.37; P = 0.03). The amplitude of scotopic 3.0 ERG a-waves and b-waves significantly correlated with the percentage change of the β-strand in the secondary structure (a-wave: R = -0.58, P < 0.01; b-wave: R = -0.53, P < 0.01). Conclusions CCV is a promising indicator to quantify the structural disorganization of XLRS retina. The OCT-CCSEG software calculated CCV automatically, potentially facilitating prognosis assessment and development of personalized treatment. Moreover, MD-involved genotype-phenotype analysis suggests an association between protein structural alterations and XLRS severity measured by CCV and ERG.
Collapse
Affiliation(s)
- Xing Wei
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Hui Li
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Tian Zhu
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Zixi Sun
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Ruifang Sui
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| |
Collapse
|
24
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
25
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
26
|
Tan HJ, Deng ZH, Shen H, Deng HW, Xiao HM. Single-cell RNA-seq identified novel genes involved in primordial follicle formation. Front Endocrinol (Lausanne) 2023; 14:1285667. [PMID: 38149096 PMCID: PMC10750415 DOI: 10.3389/fendo.2023.1285667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/27/2023] [Indexed: 12/28/2023] Open
Abstract
Introduction The number of primordial follicles (PFs) in mammals determines the ovarian reserve, and impairment of primordial follicle formation (PFF) will cause premature ovarian insufficiency (POI). Methods By analyzing public single-cell RNA sequencing performed during PFF on mice and human ovaries, we identified novel functional genes and novel ligand-receptor interaction during PFF. Based on immunofluorescence and in vitro ovarian culture, we confirmed mechanisms of genes and ligand-receptor interaction in PFF. We also applied whole exome sequencing (WES) in 93 cases with POI and whole genome sequencing (WGS) in 465 controls. Variants in POI patients were further investigated by in silico analysis and functional verification. Results We revealed ANXA7 (annexin A7) and GTF2F1 (general transcription factor IIF subunit 1) in germ cells to be novel potentially genes in promoting PFF. Ligand Mdk (midkine) in germ cells and its receptor Sdc1 (syndecan 1) in granulosa cells are novel interaction crucial for PFF. Based on immunofluorescence, we confirmed significant up-regulation of ANXA7 in PFs compared with germline cysts, and uniform expression of GTF2F1, MDK and SDC1 during PFF, in 25 weeks human fetal ovary. In vitro investigation indicated that Anxa7 and Gtf2f1 are vital for mice PFF by regulating Jak/Stat3 and Jnk signaling pathways, respectively. Ligand-receptor (Mdk-Sdc1) are crucial for PFF by regulating Pi3k-akt signaling pathway. Two heterozygous variants in GTF2F1, and one heterozygous variants in SDC1 were identified in cases, but no variant were identified in controls. The protein level of GTF2F1 or SDC1 in POI cases are significantly lower than that of controls, indicating the pathogenic effects of the two genes on ovarian function were dosage dependent. Discussion Our study identified novel genes and novel ligand-receptor interaction during PFF, and further expanding the genetic architecture of POI.
Collapse
Affiliation(s)
- Hang-Jing Tan
- Institute of Reproduction and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Center for Reproductive Health, and System Biology, Data Sciences, School of Basic Medical Science, Central South University, Changsha, China
| | - Zi-Heng Deng
- Institute of Reproduction and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Center for Reproductive Health, and System Biology, Data Sciences, School of Basic Medical Science, Central South University, Changsha, China
| | - Hui Shen
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Hong-Wen Deng
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Hong-Mei Xiao
- Institute of Reproduction and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Center for Reproductive Health, and System Biology, Data Sciences, School of Basic Medical Science, Central South University, Changsha, China
| |
Collapse
|
27
|
Stein D, Kars ME, Wu Y, Bayrak ÇS, Stenson PD, Cooper DN, Schlessinger A, Itan Y. Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set. Genome Med 2023; 15:103. [PMID: 38037155 PMCID: PMC10688473 DOI: 10.1186/s13073-023-01261-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
Gain-of-function (GOF) variants give rise to increased/novel protein functions whereas loss-of-function (LOF) variants lead to diminished protein function. Experimental approaches for identifying GOF and LOF are generally slow and costly, whilst available computational methods have not been optimized to discriminate between GOF and LOF variants. We have developed LoGoFunc, a machine learning method for predicting pathogenic GOF, pathogenic LOF, and neutral genetic variants, trained on a broad range of gene-, protein-, and variant-level features describing diverse biological characteristics. LoGoFunc outperforms other tools trained solely to predict pathogenicity for identifying pathogenic GOF and LOF variants and is available at https://itanlab.shinyapps.io/goflof/ .
Collapse
Affiliation(s)
- David Stein
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Meltem Ece Kars
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Yiming Wu
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- College of Life Science, China West Normal University, Nan Chong, Si Chuan, 637009, China
| | - Çiğdem Sevim Bayrak
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| | - Yuval Itan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
28
|
Jorge SD, Chi YI, Mazaba JL, Haque N, Wagenknecht J, Smith BC, Volkman BF, Mathison AJ, Lomberk G, Zimmermann MT, Urrutia R. Deep computational phenotyping of genomic variants impacting the SET domain of KMT2C reveal molecular mechanisms for their dysfunction. Front Genet 2023; 14:1291307. [PMID: 38090150 PMCID: PMC10715303 DOI: 10.3389/fgene.2023.1291307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 11/17/2023] [Indexed: 12/29/2023] Open
Abstract
Introduction: Kleefstra Syndrome type 2 (KLEFS-2) is a genetic, neurodevelopmental disorder characterized by intellectual disability, infantile hypotonia, severe expressive language delay, and characteristic facial appearance, with a spectrum of other distinct clinical manifestations. Pathogenic mutations in the epigenetic modifier type 2 lysine methyltransferase KMT2C have been identified to be causative in KLEFS-2 individuals. Methods: This work reports a translational genomic study that applies a multidimensional computational approach for deep variant phenotyping, combining conventional genomic analyses, advanced protein bioinformatics, computational biophysics, biochemistry, and biostatistics-based modeling. We use standard variant annotation, paralog annotation analyses, molecular mechanics, and molecular dynamics simulations to evaluate damaging scores and provide potential mechanisms underlying KMT2C variant dysfunction. Results: We integrated data derived from the structure and dynamics of KMT2C to classify variants into SV (Structural Variant), DV (Dynamic Variant), SDV (Structural and Dynamic Variant), and VUS (Variant of Uncertain Significance). When compared with controls, these variants show values reflecting alterations in molecular fitness in both structure and dynamics. Discussion: We demonstrate that our 3D models for KMT2C variants suggest distinct mechanisms that lead to their imbalance and are not predictable from sequence alone. Thus, the missense variants studied here cause destabilizing effects on KMT2C function by different biophysical and biochemical mechanisms which we adeptly describe. This new knowledge extends our understanding of how variations in the KMT2C gene cause the dysfunction of its methyltransferase enzyme product, thereby bearing significant biomedical relevance for carriers of KLEFS2-associated genomic mutations.
Collapse
Affiliation(s)
- Salomão Dória Jorge
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Young-In Chi
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Jose Lizarraga Mazaba
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Neshatul Haque
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Jessica Wagenknecht
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Brian C. Smith
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Brian F. Volkman
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Angela J. Mathison
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Gwen Lomberk
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Pharmacology and Toxicology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Michael T. Zimmermann
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
- Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Raul Urrutia
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
29
|
Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Yu DJ, Shoombuatong W. MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction. J Chem Inf Model 2023; 63:7239-7257. [PMID: 37947586 PMCID: PMC10685454 DOI: 10.1021/acs.jcim.3c00950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/21/2023] [Accepted: 10/23/2023] [Indexed: 11/12/2023]
Abstract
Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals' outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites' ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals' outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.
Collapse
Affiliation(s)
- Fang Ge
- School
of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, 9 Wenyuanlu, Nanjing 210023, China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Zihao Yan
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Hanin Alahmadi
- College of
Computer Science and Engineering, Taibah
University, Madinah 344, Saudi Arabia
| | - Apilak Worachartcheewan
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Dong-Jun Yu
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Watshara Shoombuatong
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
30
|
Moore A, Marks JA, Quach BC, Guo Y, Bierut LJ, Gaddis NC, Hancock DB, Page GP, Johnson EO. Evaluating 17 methods incorporating biological function with GWAS summary statistics to accelerate discovery demonstrates a tradeoff between high sensitivity and high positive predictive value. Commun Biol 2023; 6:1199. [PMID: 38001305 PMCID: PMC10673847 DOI: 10.1038/s42003-023-05413-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 10/03/2023] [Indexed: 11/26/2023] Open
Abstract
Where sufficiently large genome-wide association study (GWAS) samples are not currently available or feasible, methods that leverage increasing knowledge of the biological function of variants may illuminate discoveries without increasing sample size. We comprehensively evaluated 17 functional weighting methods for identifying novel associations. We assessed the performance of these methods using published results from multiple GWAS waves across each of five complex traits. Although no method achieved both high sensitivity and positive predictive value (PPV) for any trait, a subset of methods utilizing pleiotropy and expression quantitative trait loci nominated variants with high PPV (>75%) for multiple traits. Application of functionally weighting methods to enhance GWAS power for locus discovery is unlikely to circumvent the need for larger sample sizes in truly underpowered GWAS, but these results suggest that applying functional weighting to GWAS can accurately nominate additional novel loci from available samples for follow-up studies.
Collapse
Affiliation(s)
- Amy Moore
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA.
| | - Jesse A Marks
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Bryan C Quach
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Yuelong Guo
- GeneCentric Therapeutics, Inc., Cary, NC, USA
| | - Laura J Bierut
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Nathan C Gaddis
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Dana B Hancock
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Grier P Page
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
- Fellow Program, RTI International, Research Triangle Park, NC, 27709, USA
| | - Eric O Johnson
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA.
- Fellow Program, RTI International, Research Triangle Park, NC, 27709, USA.
| |
Collapse
|
31
|
Tao LR, Ye Y, Zhao H. Early breast cancer risk detection: a novel framework leveraging polygenic risk scores and machine learning. J Med Genet 2023; 60:960-964. [PMID: 37055164 DOI: 10.1136/jmg-2022-108582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 03/27/2023] [Indexed: 04/15/2023]
Abstract
BACKGROUND Breast cancer (BC) is the most common cancer and the second leading cause of cancer death in women; an estimated one in eight women in the USA will develop BC during her lifetime. However, current methods of BC screening, including clinical breast exams, mammograms, biopsies and others, are often underused due to limited access, expense and a lack of risk awareness, causing 30% (up to 80% in low-income and middle-income countries) of patients with BC to miss the precious early detection phase. METHODS This study creates a key step to supplement the current BC diagnostic pipeline: a prescreening platform, prior to traditional detection and diagnostic steps. We have developed BREast CAncer Risk Detection Application (BRECARDA), a novel framework that personalises BC risk assessment using artificial intelligence neural networks to incorporate relevant genetic and non-genetic risk factors. A polygenic risk score (PRS) was enhanced by employing AnnoPred and validated by fivefolds cross-validation, outperforming three existing state-of-the-art PRS methods. RESULTS We used data from 97 597 female participants of the UK BioBank to train our algorithm. Using the enhanced PRS thus trained together with non-genetic information, BRECARDA was evaluated in a testing dataset with 48 074 UK Biobank female participants and achieved a high accuracy of 94.28% and area under the curve of 0.7861. Our optimised AnnoPred outperformed other state-of-the-art methods on quantifying genetic risk, indicating its potential for supplementing the current BC detection tests, population screening and risk evaluation. CONCLUSION BRECARDA can enhance disease risk prediction, identify high-risk individuals for BC screening, facilitate disease diagnosis and improve population-level screening efficiency. It can serve as a valuable and supplemental platform to assist doctors in BC diagnosis and evaluation.
Collapse
Affiliation(s)
- Lynn Rose Tao
- Thomas Jefferson High School for Science and Technology, Alexandria, Virginia, USA
| | - Yixuan Ye
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, USA
| |
Collapse
|
32
|
He Q, Keding TJ, Zhang Q, Miao J, Russell JD, Herringa RJ, Lu Q, Travers BG, Li JJ. Neurogenetic mechanisms of risk for ADHD: Examining associations of polygenic scores and brain volumes in a population cohort. J Neurodev Disord 2023; 15:30. [PMID: 37653373 PMCID: PMC10469494 DOI: 10.1186/s11689-023-09498-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 08/21/2023] [Indexed: 09/02/2023] Open
Abstract
BACKGROUND ADHD polygenic scores (PGSs) have been previously shown to predict ADHD outcomes in several studies. However, ADHD PGSs are typically correlated with ADHD but not necessarily reflective of causal mechanisms. More research is needed to elucidate the neurobiological mechanisms underlying ADHD. We leveraged functional annotation information into an ADHD PGS to (1) improve the prediction performance over a non-annotated ADHD PGS and (2) test whether volumetric variation in brain regions putatively associated with ADHD mediate the association between PGSs and ADHD outcomes. METHODS Data were from the Philadelphia Neurodevelopmental Cohort (N = 555). Multiple mediation models were tested to examine the indirect effects of two ADHD PGSs-one using a traditional computation involving clumping and thresholding and another using a functionally annotated approach (i.e., AnnoPred)-on ADHD inattention (IA) and hyperactivity-impulsivity (HI) symptoms, via gray matter volumes in the cingulate gyrus, angular gyrus, caudate, dorsolateral prefrontal cortex (DLPFC), and inferior temporal lobe. RESULTS A direct effect was detected between the AnnoPred ADHD PGS and IA symptoms in adolescents. No indirect effects via brain volumes were detected for either IA or HI symptoms. However, both ADHD PGSs were negatively associated with the DLPFC. CONCLUSIONS The AnnoPred ADHD PGS was a more developmentally specific predictor of adolescent IA symptoms compared to the traditional ADHD PGS. However, brain volumes did not mediate the effects of either a traditional or AnnoPred ADHD PGS on ADHD symptoms, suggesting that we may still be underpowered in clarifying brain-based biomarkers for ADHD using genetic measures.
Collapse
Affiliation(s)
- Quanfa He
- Department of Psychology, University of, Wisconsin-Madison, 1202 W. Johnson Street, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, USA
| | | | - Qi Zhang
- Department of Educational Psychology, University of Wisconsin-Madison, Madison, USA
| | - Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, USA
| | - Justin D Russell
- Department of Psychiatry, School of Medicine and Public Health, University of Wisconsin, Madison, USA
| | - Ryan J Herringa
- Department of Psychiatry, School of Medicine and Public Health, University of Wisconsin, Madison, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, USA
- Department of Statistics, University of Wisconsin-Madison, Madison, USA
| | - Brittany G Travers
- Waisman Center, University of Wisconsin-Madison, Madison, USA
- Department of Kinesiology, University of Wisconsin-Madison, Madison, USA
| | - James J Li
- Department of Psychology, University of, Wisconsin-Madison, 1202 W. Johnson Street, Madison, WI, 53706, USA.
- Waisman Center, University of Wisconsin-Madison, Madison, USA.
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, USA.
| |
Collapse
|
33
|
Ye Y, Noche RB, Szejko N, Both CP, Acosta JN, Leasure AC, Brown SC, Sheth KN, Gill TM, Zhao H, Falcone GJ. A genome-wide association study of frailty identifies significant genetic correlation with neuropsychiatric, cardiovascular, and inflammation pathways. GeroScience 2023; 45:2511-2523. [PMID: 36928559 PMCID: PMC10651618 DOI: 10.1007/s11357-023-00771-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 03/10/2023] [Indexed: 03/18/2023] Open
Abstract
Frailty is an aging-related clinical phenotype defined as a state in which there is an increase in a person's vulnerability for dependency and/or mortality when exposed to a stressor. While underlying mechanisms leading to the occurrence of frailty are complex, the importance of genetic factors has not been fully investigated. We conducted a large-scale genome-wide association study (GWAS) of frailty, as defined by the five criteria (weight loss, exhaustion, physical activity, walking speed, and grip strength) captured in the Fried Frailty Score (FFS), in 386,565 European descent participants enrolled in the UK Biobank (mean age 57 [SD 8] years, 208,481 [54%] females). We identified 37 independent, novel loci associated with the FFS (p < 5 × 10-8), including seven loci without prior described associations with other traits. The variants associated with FFS were significantly enriched in brain tissues as well as aging-related pathways. Our post-GWAS bioinformatic analyses revealed significant genetic correlations between FFS and cardiovascular-, neurological-, and inflammation-related diseases/traits, and subsequent Mendelian Randomization analyses identified causal associations with chronic pain, obesity, diabetes, education-related traits, joint disorders, and depressive/neurological, metabolic, and respiratory diseases. The GWAS signals were replicated in the Health and Retirement Study (HRS, n = 9,720, mean age 73 [SD 7], 5,582 [57%] females), where the polygenic risk score built from UKB GWAS was significantly associated with the FFS in HRS individuals (OR per SD of the score 1.27, 95% CI 1.22-1.31, p = 1.3 × 10-11). These results provide new insight into the biology of frailty by comprehensively evaluating its genetic architecture.
Collapse
Affiliation(s)
- Yixuan Ye
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Rommell B Noche
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Natalia Szejko
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
- Department of Neurology, Medical University of Warsaw, Warsaw, Poland
- Department of Bioethics, Medical University of Warsaw, Warsaw, Poland
| | - Cameron P Both
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Julian N Acosta
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Audrey C Leasure
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Stacy C Brown
- University of Hawai'I, John A. Burns School of Medicine, Honolulu, HI, USA
| | - Kevin N Sheth
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Thomas M Gill
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Biostatistics, Yale School of Public Health, 60 College Street, P.O. Box 208034, New Haven, CT, 06520, USA.
| | - Guido J Falcone
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA.
| |
Collapse
|
34
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
35
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
36
|
Johnson EC, Kapoor M, Hatoum AS, Zhou H, Polimanti R, Wendt FR, Walters RK, Lai D, Kember RL, Hartz S, Meyers JL, Peterson RE, Ripke S, Bigdeli TB, Fanous AH, Pato CN, Pato MT, Goate AM, Kranzler HR, O'Donovan MC, Walters JTR, Gelernter J, Edenberg HJ, Agrawal A. Investigation of convergent and divergent genetic influences underlying schizophrenia and alcohol use disorder. Psychol Med 2023; 53:1196-1204. [PMID: 34231451 PMCID: PMC8738774 DOI: 10.1017/s003329172100266x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
BACKGROUND Alcohol use disorder (AUD) and schizophrenia (SCZ) frequently co-occur, and large-scale genome-wide association studies (GWAS) have identified significant genetic correlations between these disorders. METHODS We used the largest published GWAS for AUD (total cases = 77 822) and SCZ (total cases = 46 827) to identify genetic variants that influence both disorders (with either the same or opposite direction of effect) and those that are disorder specific. RESULTS We identified 55 independent genome-wide significant single nucleotide polymorphisms with the same direction of effect on AUD and SCZ, 8 with robust effects in opposite directions, and 98 with disorder-specific effects. We also found evidence for 12 genes whose pleiotropic associations with AUD and SCZ are consistent with mediation via gene expression in the prefrontal cortex. The genetic covariance between AUD and SCZ was concentrated in genomic regions functional in brain tissues (p = 0.001). CONCLUSIONS Our findings provide further evidence that SCZ shares meaningful genetic overlap with AUD.
Collapse
Affiliation(s)
- Emma C Johnson
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Manav Kapoor
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexander S Hatoum
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Hang Zhou
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | - Renato Polimanti
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | - Frank R Wendt
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | - Raymond K Walters
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Rachel L Kember
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Sarah Hartz
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Jacquelyn L Meyers
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
- Henri Begleiter Neurodynamics Laboratory, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Roseann E Peterson
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Stephan Ripke
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Campus Mitte, Berlin, Germany
| | - Tim B Bigdeli
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Ayman H Fanous
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Carlos N Pato
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Michele T Pato
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Alison M Goate
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Henry R Kranzler
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Michael C O'Donovan
- Division of Psychological Medicine and Clinical Neurosciences, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
| | - James T R Walters
- Division of Psychological Medicine and Clinical Neurosciences, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
| | - Joel Gelernter
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA
| | - Howard J Edenberg
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| |
Collapse
|
37
|
Zhang J, Zhao H. eQTL Studies: from Bulk Tissues to Single Cells. ARXIV 2023:arXiv:2302.11662v1. [PMID: 36866231 PMCID: PMC9980190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of certain genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies to date have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detections of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| |
Collapse
|
38
|
Molecular Dynamic Simulation Analysis of a Novel Missense Variant in CYB5R3 Gene in Patients with Methemoglobinemia. MEDICINA (KAUNAS, LITHUANIA) 2023; 59:medicina59020379. [PMID: 36837579 PMCID: PMC9967277 DOI: 10.3390/medicina59020379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/13/2023] [Accepted: 02/14/2023] [Indexed: 02/18/2023]
Abstract
Background and Objective: Mutations in the CYB5R3 gene cause reduced NADH-dependent cytochrome b5 reductase enzyme function and consequently lead to recessive congenital methemoglobinemia (RCM). RCM exists as RCM type I (RCM1) and RCM type II (RCM2). RCM1 leads to higher methemoglobin levels causing only cyanosis, while in RCM2, neurological complications are also present along with cyanosis. Materials and Methods: In the current study, a consanguineous Pakistani family with three individuals showing clinical manifestations of cyanosis, chest pain radiating to the left arm, dyspnea, orthopnea, and hemoptysis was studied. Following clinical assessment, a search for the causative gene was performed using whole exome sequencing (WES) and Sanger sequencing. Various variant effect prediction tools and ACMG criteria were applied to interpret the pathogenicity of the prioritized variants. Molecular dynamic simulation studies of wild and mutant systems were performed to determine the stability of the mutant CYB5R3 protein. Results: Data analysis of WES revealed a novel homozygous missense variant NM_001171660.2: c.670A > T: NP_001165131.1: p.(Ile224Phe) in exon 8 of the CYB5R3 gene located on chromosome 22q13.2. Sanger sequencing validated the segregation of the identified variant with the disease phenotype within the family. Bioinformatics prediction tools and ACMG guidelines predicted the identified variant p.(Ile224Phe) as disease-causing and likely pathogenic, respectively. Molecular dynamics study revealed that the variant p.(Ile224Phe) in the CYB5R3 resides in the NADH domain of the protein, the aberrant function of which is detrimental. Conclusions: The present study expanded the variant spectrum of the CYB5R3 gene. This will facilitate genetic counselling of the same and other similar families carrying mutations in the CYB5R3 gene.
Collapse
|
39
|
Li RY, Huang Y, Zhao Z, Qin ZS. Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome. Data Brief 2023; 46:108827. [PMID: 36582986 PMCID: PMC9792340 DOI: 10.1016/j.dib.2022.108827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users' own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data.
Collapse
Affiliation(s)
- Ronnie Y. Li
- Graduate program in Neuroscience, Emory University, United States
| | - Yanting Huang
- Department of Computer Science, Emory University, United States
| | - Zhiyue Zhao
- Department of Computer Science, Emory University, United States
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Emory University, United States
| |
Collapse
|
40
|
Garcia FADO, de Andrade ES, Palmero EI. Insights on variant analysis in silico tools for pathogenicity prediction. Front Genet 2022; 13:1010327. [PMID: 36568376 PMCID: PMC9774026 DOI: 10.3389/fgene.2022.1010327] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
Molecular biology is currently a fast-advancing science. Sequencing techniques are getting cheaper, but the interpretation of genetic variants requires expertise and computational power, therefore is still a challenge. Next-generation sequencing releases thousands of variants and to classify them, researchers propose protocols with several parameters. Here we present a review of several in silico pathogenicity prediction tools involved in the variant prioritization/classification process used by some international protocols for variant analysis and studies evaluating their efficiency.
Collapse
Affiliation(s)
| | | | - Edenir Inez Palmero
- Molecular Oncology Research Center—Barretos Cancer Hospital, Barretos, Brazil,National Institute of Cancer, Rio de Janeiro, Brazil,*Correspondence: Edenir Inez Palmero,
| |
Collapse
|
41
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
- Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
42
|
Van de Sompele S, Small KW, Cicekdal MB, Soriano VL, D'haene E, Shaya FS, Agemy S, Van der Snickt T, Rey AD, Rosseel T, Van Heetvelde M, Vergult S, Balikova I, Bergen AA, Boon CJF, De Zaeytijd J, Inglehearn CF, Kousal B, Leroy BP, Rivolta C, Vaclavik V, van den Ende J, van Schooneveld MJ, Gómez-Skarmeta JL, Tena JJ, Martinez-Morales JR, Liskova P, Vleminckx K, De Baere E. Multi-omics approach dissects cis-regulatory mechanisms underlying North Carolina macular dystrophy, a retinal enhanceropathy. Am J Hum Genet 2022; 109:2029-2048. [PMID: 36243009 PMCID: PMC9674966 DOI: 10.1016/j.ajhg.2022.09.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/28/2022] [Indexed: 01/26/2023] Open
Abstract
North Carolina macular dystrophy (NCMD) is a rare autosomal-dominant disease affecting macular development. The disease is caused by non-coding single-nucleotide variants (SNVs) in two hotspot regions near PRDM13 and by duplications in two distinct chromosomal loci, overlapping DNase I hypersensitive sites near either PRDM13 or IRX1. To unravel the mechanisms by which these variants cause disease, we first established a genome-wide multi-omics retinal database, RegRet. Integration of UMI-4C profiles we generated on adult human retina then allowed fine-mapping of the interactions of the PRDM13 and IRX1 promoters and the identification of eighteen candidate cis-regulatory elements (cCREs), the activity of which was investigated by luciferase and Xenopus enhancer assays. Next, luciferase assays showed that the non-coding SNVs located in the two hotspot regions of PRDM13 affect cCRE activity, including two NCMD-associated non-coding SNVs that we identified herein. Interestingly, the cCRE containing one of these SNVs was shown to interact with the PRDM13 promoter, demonstrated in vivo activity in Xenopus, and is active at the developmental stage when progenitor cells of the central retina exit mitosis, suggesting that this region is a PRDM13 enhancer. Finally, mining of single-cell transcriptional data of embryonic and adult retina revealed the highest expression of PRDM13 and IRX1 when amacrine cells start to synapse with retinal ganglion cells, supporting the hypothesis that altered PRDM13 or IRX1 expression impairs interactions between these cells during retinogenesis. Overall, this study provides insight into the cis-regulatory mechanisms of NCMD and supports that this condition is a retinal enhanceropathy.
Collapse
Affiliation(s)
- Stijn Van de Sompele
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Kent W Small
- Macula and Retina Institute, Los Angeles and Glendale, California, USA
| | - Munevver Burcu Cicekdal
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Víctor López Soriano
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Eva D'haene
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Fadi S Shaya
- Macula and Retina Institute, Los Angeles and Glendale, California, USA
| | - Steven Agemy
- Department of Ophthalmology, SUNY Downstate Medical Center University, Brooklyn, New York, USA
| | - Thijs Van der Snickt
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Alfredo Dueñas Rey
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Toon Rosseel
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Mattias Van Heetvelde
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Sarah Vergult
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Irina Balikova
- Department of Ophthalmology, University Hospitals Leuven, Leuven, Belgium
| | - Arthur A Bergen
- Department of Human Genetics, Amsterdam UMC, Academic Medical Center, 1105 AZ Amsterdam, The Netherlands; Queen Emma Centre of Precision Medicine, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Camiel J F Boon
- Department of Ophthalmology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands; Department of Ophthalmology, Leiden University Medical Center, Leiden, The Netherlands
| | - Julie De Zaeytijd
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
| | - Chris F Inglehearn
- Division of Molecular Medicine, Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Bohdan Kousal
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Bart P Leroy
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium; Department of Head & Skin, Ghent University, Ghent, Belgium; Division of Ophthalmology & Center for Cellular & Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Carlo Rivolta
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland; Department of Ophthalmology, University of Basel, Basel, Switzerland; Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Veronika Vaclavik
- University of Lausanne, Jules-Gonin Eye Hospital, Lausanne, Switzerland
| | | | - Mary J van Schooneveld
- Department of Ophthalmology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands; Bartiméus, Diagnostic Center for Complex Visual Disorders, Zeist, The Netherlands
| | - José Luis Gómez-Skarmeta
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan R Martinez-Morales
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Petra Liskova
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic; Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Kris Vleminckx
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Elfride De Baere
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.
| |
Collapse
|
43
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
44
|
Li C, Zhi D, Wang K, Liu X. MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning. Genome Med 2022; 14:115. [PMID: 36209109 PMCID: PMC9548151 DOI: 10.1186/s13073-022-01120-z] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 09/22/2022] [Indexed: 11/22/2022] Open
Abstract
Multiple computational approaches have been developed to improve our understanding of genetic variants. However, their ability to identify rare pathogenic variants from rare benign ones is still lacking. Using context annotations and deep learning methods, we present pathogenicity prediction models, MetaRNN and MetaRNN-indel, to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs). We use independent test sets to demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. Importantly, prediction scores from both models are comparable, enabling easy adoption of integrated genotype-phenotype association analysis methods. All pre-computed nsSNV scores are available at http://www.liulab.science/MetaRNN . The stand-alone program is also available at https://github.com/Chang-Li2019/MetaRNN .
Collapse
Affiliation(s)
- Chang Li
- USF Genomics & College of Public Health, University of South Florida, 3720 Spectrum Boulevard, Suite 304, Tampa, FL 33612 USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Kai Wang
- Children’s Hospital of Philadelphia & Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Xiaoming Liu
- USF Genomics & College of Public Health, University of South Florida, 3720 Spectrum Boulevard, Suite 304, Tampa, FL 33612 USA
| |
Collapse
|
45
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
46
|
Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu WL, Lee NC, Lai F. Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e37701. [PMID: 38935959 PMCID: PMC11168239 DOI: 10.2196/37701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 07/29/2022] [Accepted: 08/22/2022] [Indexed: 06/29/2024]
Abstract
BACKGROUND In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. OBJECTIVE This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. METHODS We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model. RESULTS We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). CONCLUSIONS We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis.
Collapse
Affiliation(s)
- Yu-Shan Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Ching Hsu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yu-Chang Chune
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - I-Cheng Liao
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Hsin Wang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yi-Lin Lin
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Wuh-Liang Hwu
- Department of Pediatrics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Ni-Chung Lee
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Feipei Lai
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| |
Collapse
|
47
|
Integrating variant functional annotation scores have varied abilities to improve power of genome-wide association studies. Sci Rep 2022; 12:10720. [PMID: 35750789 PMCID: PMC9232605 DOI: 10.1038/s41598-022-14924-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 06/15/2022] [Indexed: 11/12/2022] Open
Abstract
Functional annotations have the potential to increase power of genome-wide association studies (GWAS) by prioritizing variants according to their biological function, but this potential has not been well studied. We comprehensively evaluated all 1132 traits in the UK Biobank whose SNP-heritability estimates were given “medium” or “high” labels by Neale’s lab. For each trait, we integrated GWAS summary statistics of close to 8 million common variants (minor allele frequency \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$>1\%$$\end{document}>1%) with either their 75 individual functional scores or their meta-scores, using three different data-integration methods. Overall, the number of new genome-wide significant findings after data-integration increases as a trait SNP-heritability estimate increases. However, there is a trade-off between new findings and loss of baseline GWAS findings, resulting in similar total numbers of significant findings between using GWAS alone and integrating GWAS with functional scores, across all 1132 traits analyzed and all three data-integration methods considered. Our findings suggest that, even with the current biobank-level sample size, more informative functional scores and/or new data-integration methods are needed to further improve the power of GWAS of common variants. For example, studying variants in coding sequence and obtaining cell-type-specific scores are potential future directions.
Collapse
|
48
|
Chimusa ER, Alosaimi S, Bope CD. Dissecting Generalizability and Actionability of Disease-Associated Genes From 20 Worldwide Ethnolinguistic Cultural Groups. Front Genet 2022; 13:835713. [PMID: 35812734 PMCID: PMC9263835 DOI: 10.3389/fgene.2022.835713] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 04/29/2022] [Indexed: 11/30/2022] Open
Abstract
Findings resulting from whole-genome sequencing (WGS) have markedly increased due to the massive evolvement of sequencing methods and have led to further investigations such as clinical actionability of genes, as documented by the American College of Medical Genetics and Genomics (ACMG). ACMG's actionable genes (ACGs) may not necessarily be clinically actionable across all populations worldwide. It is critical to examine the actionability of these genes in different populations. Here, we have leveraged a combined WES from the African Genome Variation and 1000 Genomes Project to examine the generalizability of ACG and potential actionable genes from four diseases: high-burden malaria, TB, HIV/AIDS, and sickle cell disease. Our results suggest that ethnolinguistic cultural groups from Africa, particularly Bantu and Khoesan, have high genetic diversity, high proportion of derived alleles at low minor allele frequency (0.0-0.1), and the highest proportion of pathogenic variants within HIV, TB, malaria, and sickle cell diseases. In contrast, ethnolinguistic cultural groups from the non-Africa continent, including Latin American, Afro-related, and European-related groups, have a high proportion of pathogenic variants within ACG than most of the ethnolinguistic cultural groups from Africa. Overall, our results show high genetic diversity in the present actionable and known disease-associated genes of four African high-burden diseases, suggesting the limitation of transferability or generalizability of ACG. This supports the use of personalized medicine as beneficial to the worldwide population as well as actionable gene list recommendation to further foster equitable global healthcare. The results point out the bias in the knowledge about the frequency distribution of these phenotypes and genetic variants associated with some diseases, especially in African and African ancestry populations.
Collapse
Affiliation(s)
- Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Medical School Cape Town, Cape Town, South Africa
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Shatha Alosaimi
- Division of Human Genetics, Department of Pathology, University of Cape Town, Medical School Cape Town, Cape Town, South Africa
| | - Christian D Bope
- Division of Human Genetics, Department of Pathology, University of Cape Town, Medical School Cape Town, Cape Town, South Africa
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
- Department of Mathematics and Computer Science, University of Kinshasa, Kinshasa, Congo
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| |
Collapse
|
49
|
Chen D, Wang X, Huang T, Jia J. Sleep and Late-Onset Alzheimer's Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 2022; 13:794202. [PMID: 35656316 PMCID: PMC9152224 DOI: 10.3389/fgene.2022.794202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/23/2022] [Indexed: 12/30/2022] Open
Abstract
Late-onset Alzheimer's disease (AD) is associated with sleep-related phenotypes (SRPs). The fact that whether they share a common genetic etiology remains largely unknown. We explored the shared genetics and causality between AD and SRPs by using high-definition likelihood (HDL), cross-phenotype association study (CPASSOC), transcriptome-wide association study (TWAS), and bidirectional Mendelian randomization (MR) in summary-level data for AD (N = 455,258) and summary-level data for seven SRPs (sample size ranges from 359,916 to 1,331,010). AD shared a strong genetic basis with insomnia (r g = 0.20; p = 9.70 × 10-5), snoring (r g = 0.13; p = 2.45 × 10-3), and sleep duration (r g = -0.11; p = 1.18 × 10-3). The CPASSOC identifies 31 independent loci shared between AD and SRPs, including four novel shared loci. Functional analysis and the TWAS showed shared genes were enriched in liver, brain, breast, and heart tissues and highlighted the regulatory roles of immunological disorders, very-low-density lipoprotein particle clearance, triglyceride-rich lipoprotein particle clearance, chylomicron remnant clearance, and positive regulation of T-cell-mediated cytotoxicity pathways. Protein-protein interaction analysis identified three potential drug target genes (APOE, MARK4, and HLA-DRA) that interacted with known FDA-approved drug target genes. The CPASSOC and TWAS demonstrated three regions 11p11.2, 6p22.3, and 16p11.2 may account for the shared basis between AD and sleep duration or snoring. MR showed insomnia had a causal effect on AD (ORIVW = 1.02, P IVW = 6.7 × 10-6), and multivariate MR suggested a potential role of sleep duration and major depression in this association. Our findings provide strong evidence of shared genetics and causation between AD and sleep abnormalities and advance our understanding of the genetic overlap between them. Identifying shared drug targets and molecular pathways can be beneficial for treating AD and sleep disorders more efficiently.
Collapse
Affiliation(s)
- Dongze Chen
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Xinpei Wang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China.,Key Laboratory of Molecular Cardiovascular Sciences (Peking University), Ministry of Education, Beijing, China.,Center for Intelligent Public Health, Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Jinzhu Jia
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China.,Center for Statistical Science, Peking University, Beijing, China
| |
Collapse
|
50
|
Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics 2022; 38:3164-3172. [PMID: 35389435 PMCID: PMC9890318 DOI: 10.1093/bioinformatics/btac214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. RESULTS We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TLVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Chen
- To whom correspondence should be addressed.
| | | | - Fengdi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|