1
|
Zhao Y, Lan T, Zhong G, Hagen J, Pan H, Chung WK, Shen Y. A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data. Nat Commun 2025; 16:4670. [PMID: 40393980 PMCID: PMC12092651 DOI: 10.1038/s41467-025-59937-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 05/06/2025] [Indexed: 05/22/2025] Open
Abstract
Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( d ) and a population level (selection coefficient, s ), assuming that in the same gene, missense variants with similar d have similar s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts s and yields new insights from genomic data.
Collapse
Affiliation(s)
- Yige Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY, USA
| | - Tian Lan
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY, USA
| | - Jake Hagen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Hongbing Pan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA.
| |
Collapse
|
2
|
Ruiz-Alías G, Soldevila S, Altafaj X, Cordomí A, Olivella M. Missense variants pathogenicity annotation from homologous proteins. Bioinformatics 2025; 41:btaf305. [PMID: 40366734 PMCID: PMC12122210 DOI: 10.1093/bioinformatics/btaf305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Revised: 04/09/2025] [Accepted: 05/13/2025] [Indexed: 05/15/2025] Open
Abstract
MOTIVATION High-throughput DNA sequencing has revealed millions of single nucleotide variants (SNVs) in the human genome, with a small fraction linked to disease. The effect of missense variants, which alter the protein sequence, is particularly challenging to interpret due to the scarcity of clinical annotations and experimental information. While using conservation and structural information, current prediction tools still struggle to predict variant pathogenicity. In this study, we explored the pathogenicity of homologous missense variants-variants in equivalent positions across homologous proteins-focusing on proteins involved in autosomal dominant diseases. RESULTS Our analysis of 2976 pathogenic and 17 555 non-pathogenic homologous variants demonstrated that pathogenicity can be extrapolated with 95% accuracy within a family, or up to 98% for closer homologs. Remarkably, the evaluation of 27 commonly used mutation predictor methods revealed that they were not fully capturing this biological feature. To facilitate the exploration of homologous variants, we created HomolVar, a web server that computationally predicts the pathogenesis of missense variants using annotations from homologous variants, freely available at https://rarevariants.org/HomolVar. Overall, these findings and the accompanying tool offer a robust method for predicting the pathogenicity of unannotated variants, enhancing genotype-phenotype correlations, and contributing to diagnosing rare genetic disorders. AVAILABILITY AND IMPLEMENTATION HomolVar is freely available at https://rarevariants.org/HomolVar.
Collapse
Affiliation(s)
- Gabriel Ruiz-Alías
- Department of Biosciences, Faculty of Sciences and Technology, University of Vic-Central University of Catalonia, Vic, Barcelona 08500, Spain
- Institute for Research and Innovation in Life and Health Sciences (IRIS-CC), University of Vic-Central University of Catalonia, Vic, Barcelona 08500, Spain
| | - Sergi Soldevila
- Department of Biosciences, Faculty of Sciences and Technology, University of Vic-Central University of Catalonia, Vic, Barcelona 08500, Spain
- Institute for Research and Innovation in Life and Health Sciences (IRIS-CC), University of Vic-Central University of Catalonia, Vic, Barcelona 08500, Spain
| | - Xavier Altafaj
- Department of Biomedicine, School of Medicine and Health Sciences, Institute of Neurosciences, University of Barcelona, Barcelona 08036, Spain
- Agustí Pi i Sunyer Biomedical Research Institute (IDIBAPS), University of Barcelona, Barcelona 08036, Spain
| | - Arnau Cordomí
- Department of Biochemistry and Molecular Biology, Faculty of Biosciences, Universitat Autònoma de Barcelona (UAB), Barcelona 08193, Spain
| | - Mireia Olivella
- Department of Biosciences, Faculty of Sciences and Technology, University of Vic-Central University of Catalonia, Vic, Barcelona 08500, Spain
- Institute for Research and Innovation in Life and Health Sciences (IRIS-CC), University of Vic-Central University of Catalonia, Vic, Barcelona 08500, Spain
| |
Collapse
|
3
|
Banerjee A, Bogetti AT, Bahar I. Accurate identification and mechanistic evaluation of pathogenic missense variants with Rhapsody-2. Proc Natl Acad Sci U S A 2025; 122:e2418100122. [PMID: 40314982 PMCID: PMC12067267 DOI: 10.1073/pnas.2418100122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 04/06/2025] [Indexed: 05/03/2025] Open
Abstract
Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication and those distinguished by pronounced fluctuations in the high-frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY11794
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY11794
| | - Anthony T. Bogetti
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY11794
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY11794
| | - Ivet Bahar
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY11794
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY11794
| |
Collapse
|
4
|
Zhou K, Gheybi K, Soh PXY, Hayes VM. Evaluating variant pathogenicity prediction tools to establish African inclusive guidelines for germline genetic testing. COMMUNICATIONS MEDICINE 2025; 5:157. [PMID: 40328947 PMCID: PMC12056225 DOI: 10.1038/s43856-025-00883-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 04/24/2025] [Indexed: 05/08/2025] Open
Abstract
BACKGROUND Genetic germline testing is restricted for African patients. Lack of ancestrally relevant genomic data perpetuated by African diversity has resulted in European-biased curated clinical variant databases and pathogenic prediction guidelines. While numerous variant pathogenicity prediction tools (VPPTs) exist, their performance has yet to be established within the context of African diversity. METHODS To address this limitation, we assessed 54 VPPTs for predictive performance (sensitivity, specificity, false positive and negative rates) across 145,291 known pathogenic or benign variants derived from 50 Southern African and 50 European men matched for advanced prostate cancer. Prioritising VPPTs for optimal ancestral performance, we screened 5.3 million variants of unknown significance for predicted functional and oncogenic potential. RESULTS We observe a 2.1- and 4.1-fold increase in the number of known and predicted rare pathogenic or benign variants, respectively, against a 1.6-fold decrease in the number of available interrogated variants in our European over African data. Although sensitivity was significantly lower for our African data overall (0.66 vs 0.71, p = 9.86E-06), MetaSVM, CADD, Eigen-raw, BayesDel-noAF, phyloP100way-vertebrate and MVP outperformed irrespective of ancestry. Conversely, MutationTaster, DANN, LRT and GERP-RS were African-specific top performers, while MutationAssessor, PROVEAN, LIST-S2 and REVEL are European-specific. Using these pathogenic prediction workflows, we narrow the ancestral gap for potentially deleterious and oncogenic variant prediction in favour of our African data by 1.15- and 1.1-fold, respectively. CONCLUSION Although VPPT sensitivity favours European data, our findings provide guidelines for VPPT selection to maximise rare pathogenic variant prediction for African disease studies.
Collapse
Affiliation(s)
- Kangping Zhou
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Kazzem Gheybi
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Pamela X Y Soh
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Vanessa M Hayes
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia.
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK.
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa.
| |
Collapse
|
5
|
Lucas MC, Keßler T, Scharf F, Steinke-Lange V, Klink B, Laner A, Holinski-Feder E. A series of reviews in familial cancer: genetic cancer risk in context variants of uncertain significance in MMR genes: which procedures should be followed? Fam Cancer 2025; 24:42. [PMID: 40317406 DOI: 10.1007/s10689-025-00470-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 04/18/2025] [Indexed: 05/07/2025]
Abstract
Interpreting variants of uncertain significance (VUS) in mismatch repair (MMR) genes remains a major challenge in managing Lynch syndrome and other hereditary cancer syndromes. This review outlines recommended VUS classification procedures, encompassing foundational and specialized methodologies tailored for MMR genes by expert organizations, including InSiGHT and ClinGen's Hereditary Colorectal Cancer/Polyposis Variant Curation Expert Panel (VCEP). Key approaches include: (1) functional data, encompassing direct assays measuring MMR proficiency such as in vitro MMR assays, deep mutational scanning, and MMR cell-based assays, as well as techniques like methylation-tolerant assays, proteomic-based approaches, and RNA sequencing, all of which provide critical functional evidence supporting variant pathogenicity; (2) computational data/tools, including in silico meta-predictors and models, which contribute to robust VUS classification when integrated with experimental evidence; and (3) enhanced variant detection to identify the actual causal variant through whole-genome sequencing and long-read sequencing to detect pathogenic variants missed by traditional methods. These strategies improve diagnostic precision, support clinical decision-making for Lynch syndrome, and establish a flexible framework that can be applied to other OMIM-listed genes.
Collapse
Affiliation(s)
- Morghan C Lucas
- MGZ- Medical Genetics Center, Munich, Germany.
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany.
| | | | | | - Verena Steinke-Lange
- MGZ- Medical Genetics Center, Munich, Germany
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| | - Barbara Klink
- MGZ- Medical Genetics Center, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| | | | - Elke Holinski-Feder
- MGZ- Medical Genetics Center, Munich, Germany
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| |
Collapse
|
6
|
Radjasandirane R, Diharce J, Gelly JC, de Brevern AG. Insights for variant clinical interpretation based on a benchmark of 65 variant effect predictors. Genomics 2025; 117:111036. [PMID: 40127826 DOI: 10.1016/j.ygeno.2025.111036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 02/20/2025] [Accepted: 03/20/2025] [Indexed: 03/26/2025]
Abstract
Single amino acid substitutions in protein sequences are generally harmless, but a certain number of these changes can lead to disease. Accurately predicting the effect of genetic variants is crucial for clinicians as it accelerates the diagnosis of patients with missense variants associated with health problems. Many computational tools have been developed to predict the pathogenicity of genetic variants with various approaches. Analysing the performance of these different computational tools is crucial to provide guidance to both future users and especially clinicians. In this study, a large-scale investigation of 65 tools was conducted. Variants from both clinical and functional contexts were used, incorporating data from the ClinVar database and bibliographic sources. The analysis showed that AlphaMissense often performed very well and was in fact one of the best options among the existing tools. In addition, as expected, meta-predictors perform well on average. Tools using evolutionary information showed the best performance for functional variants. These results also highlighted some heterogeneity in the difficulty of predicting some specific variants while others are always well categorized. Strikingly, the majority of variants from the ClinVar database appear to be easy to predict, while variants from other sources of data are more challenging. This raises questions about the use of ClinVar and the dataset used to validate tools accuracy. In addition, these results show that this variant predictability can be divided into three distinct classes: easy, moderate and hard to predict. We analyzed the parameters leading to these differences and showed that the classes are related to structural and functional information.
Collapse
Affiliation(s)
- Ragousandirane Radjasandirane
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Julien Diharce
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France.
| |
Collapse
|
7
|
Germain DP, Gruson D, Malcles M, Garcelon N. Applying artificial intelligence to rare diseases: a literature review highlighting lessons from Fabry disease. Orphanet J Rare Dis 2025; 20:186. [PMID: 40247315 PMCID: PMC12007257 DOI: 10.1186/s13023-025-03655-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 03/06/2025] [Indexed: 04/19/2025] Open
Abstract
BACKGROUND Use of artificial intelligence (AI) in rare diseases has grown rapidly in recent years. In this review we have outlined the most common machine-learning and deep-learning methods currently being used to classify and analyse large amounts of data, such as standardized images or specific text in electronic health records. To illustrate how these methods have been adapted or developed for use with rare diseases, we have focused on Fabry disease, an X-linked genetic disorder caused by lysosomal α-galactosidase. A deficiency that can result in multiple organ damage. METHODS We searched PubMed for articles focusing on AI, rare diseases, and Fabry disease published anytime up to 08 January 2025. Further searches, limited to articles published between 01 January 2021 and 31 December 2023, were also performed using double combinations of keywords related to AI and each organ affected in Fabry disease, and AI and rare diseases. RESULTS In total, 20 articles on AI and Fabry disease were included. In the rare disease field, AI methods may be applied prospectively to large populations to identify specific patients, or retrospectively to large data sets to diagnose a previously overlooked rare disease. Different AI methods may facilitate Fabry disease diagnosis, help monitor progression in affected organs, and potentially contribute to personalized therapy development. The implementation of AI methods in general healthcare and medical imaging centres may help raise awareness of rare diseases and prompt general practitioners to consider these conditions earlier in the diagnostic pathway, while chatbots and telemedicine may accelerate patient referral to rare disease experts. The use of AI technologies in healthcare may generate specific ethical risks, prompting new AI regulatory frameworks aimed at addressing these issues to be established in Europe and the United States. CONCLUSION AI-based methods will lead to substantial improvements in the diagnosis and management of rare diseases. The need for a human guarantee of AI is a key issue in pursuing innovation while ensuring that human involvement remains at the centre of patient care during this technological revolution.
Collapse
Affiliation(s)
- Dominique P Germain
- Division of Medical Genetics, University of Versailles-St Quentin en Yvelines (UVSQ), Paris-Saclay University, 2 avenue de la Source de la Bièvre, 78180, Montigny, France.
- First Faculty of Medicine, Charles University, Prague, Czech Republic.
| | - David Gruson
- Ethik-IA, PariSanté Campus, 10 Rue Oradour-Sur-Glane, 75015, Paris, France
| | | | - Nicolas Garcelon
- Imagine Institute, Data Science Platform, INSERM UMR 1163, Université de Paris, 75015, Paris, France
| |
Collapse
|
8
|
Lillback V, Bergant G, Di Feo MF, Bozović IB, Torella A, Johari M, Maver A, Pelin K, Santorelli FMM, Nigro V, Hackman P, Peterlin B, Udd B, Savarese M. Gene prioritisation for enhancing molecular diagnosis in rare skeletal muscle disease cohort. J Med Genet 2025; 62:350-357. [PMID: 40044418 DOI: 10.1136/jmg-2024-110212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 02/16/2025] [Indexed: 04/19/2025]
Abstract
BACKGROUND Inherited rare skeletal muscle diseases cause muscle weakness and wasting of variable severity. Without a molecular diagnosis, patients often endure prolonged diagnostic journeys, leading to delays in appropriate management of the disease. This occurs in approximately 60% of patients with rare diseases. METHODS To facilitate reanalysis of 278 unsolved patients, we used a gene prioritisation tool Exomiser, which standardises analysis by ranking causative variants based on phenotype relevance and variant pathogenicity. Before analysis, we benchmarked Exomiser for variant prioritisation with solved cases and for novel disease gene discovery with mock cases with variants in candidate disease genes. Additionally, we studied the significance of the specificity of the phenotype descriptions. RESULTS In our study, Exomiser ranked genes in the top 10 correctly in 97.4% of controls with previously detected causative variants. Moreover, 57.1% of candidate genes in mock cases were similarly prioritised in the top 10. We also showed that three parental muscle disease human phenotype ontologies describing the patient phenotype performed as well as patient-specific ones, with a p value of 0.68 for difference in performance. The provided automation and standardisation of variant interpretation resulted in two novel diagnoses and in findings, either in known muscle disease genes or in novel candidate genes, which need further investigation. CONCLUSIONS Exomiser is recommended for initial and periodic reanalyses of exomes in unsolved patients with myopathy, as it benefits from literature updates and minimises effort. This approach could also extend to whole genome sequencing data, aiding the interpretation of variants beyond coding regions.
Collapse
Affiliation(s)
| | - Gaber Bergant
- University Medical Centre Ljubljana, Ljubljana, Slovenia
| | - Maria Francesca Di Feo
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, and Maternal and Child Health (DINOGMI), University of Genoa, Genova, Italy
| | | | - Annalaura Torella
- Dipartimento di Biochimica, Biofisica e Patologia Generale, Università degli studi della Campania Luigi Vanvitelli, Napoli, Italy
- Telethon Institute of Genetics and Medicine, Pozzuoli, Italy
| | | | - Aleš Maver
- University Medical Centre Ljubljana, Ljubljana, Slovenia
- University of Ljubljana, Ljubljana, Slovenia
| | | | | | - Vincenzo Nigro
- Telethon Institute of Genetics and Medicine, Napoli, Italy
- Department of Precision Medicine, Universita degli Studi della Campania Luigi Vanvitelli, Napoli, Italy
| | | | - Borut Peterlin
- University Medical Centre Ljubljana, Ljubljana, Slovenia
- University of Ljubljana, Ljubljana, Slovenia
| | - Bjarne Udd
- Neuromuscular Center, Tampere University Hospital, Vasa, Finland
| | | |
Collapse
|
9
|
Luppino F, Lenz S, Chow CFW, Toth-Petroczy A. Deep learning tools predict variants in disordered regions with lower sensitivity. BMC Genomics 2025; 26:367. [PMID: 40221640 PMCID: PMC11992697 DOI: 10.1186/s12864-025-11534-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Accepted: 03/27/2025] [Indexed: 04/14/2025] Open
Abstract
BACKGROUND The recent AI breakthrough of AlphaFold2 has revolutionized 3D protein structural modeling, proving crucial for protein design and variant effects prediction. However, intrinsically disordered regions-known for their lack of well-defined structure and lower sequence conservation-often yield low-confidence models. The latest Variant Effect Predictor (VEP), AlphaMissense, leverages AlphaFold2 models, achieving over 90% sensitivity and specificity in predicting variant effects. However, the effectiveness of tools for variants in disordered regions, which account for 30% of the human proteome, remains unclear. RESULTS In this study, we found that predicting pathogenicity for variants in disordered regions is less accurate than in ordered regions, particularly for mutations at the first N-Methionine site. Investigations into the efficacy of variant effect predictors on intrinsically disordered regions (IDRs) indicated that mutations in IDRs are predicted with lower sensitivity and the gap between sensitivity and specificity is largest in disordered regions, especially for AlphaMissense and VARITY. CONCLUSIONS The prevalence of IDRs within the human proteome, coupled with the increasing repertoire of biological functions they are known to perform, necessitated an investigation into the efficacy of state-of-the-art VEPs on such regions. This analysis revealed their consistently reduced sensitivity and differing prediction performance profile to ordered regions, indicating that new IDR-specific features and paradigms are needed to accurately classify disease mutations within those regions.
Collapse
Affiliation(s)
- Federica Luppino
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
| | - Swantje Lenz
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
| | - Chi Fung Willis Chow
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany.
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany.
- Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany.
| |
Collapse
|
10
|
Zhao Y, Lan T, Zhong G, Hagen J, Pan H, Chung WK, Shen Y. A probabilistic graphical model for estimating selection coefficient of nonsynonymous variants from human population sequence data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2023.12.11.23299809. [PMID: 38168397 PMCID: PMC10760286 DOI: 10.1101/2023.12.11.23299809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We developed a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level (𝑑) and a population level (selection coefficient, 𝑠), assuming that in the same gene, missense variants with similar 𝑑 have similar 𝑠. We trained it by maximizing probability of observed allele counts in 236,017 European individuals. We show that 𝑠 is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, 𝑠 outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts 𝑠 and yields new insights from genomic data.
Collapse
Affiliation(s)
- Yige Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY 10032
| | - Tian Lan
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
| | - Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY 10032
| | - Jake Hagen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- . Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA 02115
| | - Hongbing Pan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032
| | - Wendy K. Chung
- . Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA 02115
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032
| |
Collapse
|
11
|
Di Donna MG, Colona VL, Bagnato MR, Bonomi CG, Tirrito L, Marchionni E, Motta C, Sangiuolo FC, Martorana A. NOTCH3 variants of unknown significance underpin vascular dysfunction in neurodegenerative disease: a case series of three nfvPPA-FTD patients. Neurol Sci 2025; 46:1637-1646. [PMID: 39652165 DOI: 10.1007/s10072-024-07908-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 11/25/2024] [Indexed: 03/19/2025]
Abstract
INTRODUCTION The NOTCH3 gene encodes for an evolutionarily conserved protein, whose functions encompass both embryonic cell proliferation and adult tissue-specific differentiation. Among others, a pivotal role in maintaining functional integrity of neurovascular unit (NVU) is supported by the association of several NOTCH3 gene mutations with neuroimaging markers of cerebral small vessel disease (SVD). Indeed, a pathogenic role of NOTCH3 is recognised in cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). However, an increasing number of NOTCH3 variants with unclear pathogenic role have been identified in patients suspected of having CADASIL. The following case series describes three patients under the age of 65 with clinical diagnosis of nonfluent-variant of primary progressive aphasia (nfvPPA), whose genetic analysis revealed the presence of three distinct novel variants of unknown significance (VUS) in NOTCH3 gene. RESULTS The diagnostic work-up revealed common features among the patients: clinical presentation -nfvPPA at neuropsychological evaluation with consistent extrapyramidal symptoms; neuroimaging -low brain MR burden of SVD and FDG-PET impairment of cortical areas involved in speech production network; and biomarkers -Cerebrospinal fluid (CSF) analysis negative for Alzheimer's Disease (AD), corroborating suspicion of underlying Frontotemporal Lobe Degeneration (FTLD). DISCUSSION AND CONCLUSION The retrieved VUS in NOTCH3 suggest that the involvement of Notch signalling in pathophysiology of neurodegenerative disease is more complex and needs to be fully explored. Rare variants in SVD-associated genes may influence progression of neurodegeneration via the dysfunction of several vascular pathways.
Collapse
Affiliation(s)
- M G Di Donna
- UOSD Centro Demenze, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy.
- Stroke Unit, Ospedale F. Spaziani, Via A. Fabi 5, 03100, Frosinone, Italy.
| | - V L Colona
- Research Unit of Neuromuscular and Neurodegenerative Disorders, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
- Movement Analysis and Robotics Laboratory (MARlab), Research Unit of Neurorehabilitation, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - M R Bagnato
- Stroke Unit, Ospedale F. Spaziani, Via A. Fabi 5, 03100, Frosinone, Italy
- Stroke Unit, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| | - C G Bonomi
- UOSD Centro Demenze, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| | - L Tirrito
- Department of Biomedicine and Prevention, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| | - E Marchionni
- Department of Biomedicine and Prevention, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| | - C Motta
- UOSD Centro Demenze, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| | - F C Sangiuolo
- Department of Biomedicine and Prevention, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| | - A Martorana
- UOSD Centro Demenze, University of Rome Tor Vergata, Viale Oxford 81, 00133, Rome, Italy
| |
Collapse
|
12
|
Hamelin D, Scicluna M, Saadie I, Mostefai F, Grenier J, Baron C, Caron E, Hussin J. Predicting pathogen evolution and immune evasion in the age of artificial intelligence. Comput Struct Biotechnol J 2025; 27:1370-1382. [PMID: 40235636 PMCID: PMC11999473 DOI: 10.1016/j.csbj.2025.03.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 03/21/2025] [Accepted: 03/26/2025] [Indexed: 04/17/2025] Open
Abstract
The genomic diversification of viral pathogens during viral epidemics and pandemics represents a major adaptive route for infectious agents to circumvent therapeutic and public health initiatives. Historically, strategies to address viral evolution have relied on responding to emerging variants after their detection, leading to delays in effective public health responses. Because of this, a long-standing yet challenging objective has been to forecast viral evolution by predicting potentially harmful viral mutations prior to their emergence. The promises of artificial intelligence (AI) coupled with the exponential growth of viral data collection infrastructures spurred by the COVID-19 pandemic, have resulted in a research ecosystem highly conducive to this objective. Due to the COVID-19 pandemic accelerating the development of pandemic mitigation and preparedness strategies, many of the methods discussed here were designed in the context of SARS-CoV-2 evolution. However, most of these pipelines were intentionally designed to be adaptable across RNA viruses, with several strategies already applied to multiple viral species. In this review, we explore recent breakthroughs that have facilitated the forecasting of viral evolution in the context of an ongoing pandemic, with particular emphasis on deep learning architectures, including the promising potential of language models (LM). The approaches discussed here employ strategies that leverage genomic, epidemiologic, immunologic and biological information.
Collapse
Affiliation(s)
- D.J. Hamelin
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Montréal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| | - M. Scicluna
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Montréal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| | - I. Saadie
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| | - F. Mostefai
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Montréal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| | - J.C. Grenier
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
| | - C. Baron
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Montréal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| | - E. Caron
- CHU Sainte-Justine Research Center, Université de Montréal, Montréal, Quebec, Canada
- Yale Center for Immuno-Oncology, Yale Center for Systems and Engineering Immunology, Yale Center for Infection and Immunity, Yale School of Medicine, New Haven, CT, USA
| | - J.G. Hussin
- Montreal Heart Institute, Université de Montréal, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Montréal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
- Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| |
Collapse
|
13
|
He R, Sarwal V, Qiu X, Zhuang Y, Zhang L, Liu Y, Chiang J. Generative AI Models in Time-Varying Biomedical Data: Scoping Review. J Med Internet Res 2025; 27:e59792. [PMID: 40063929 PMCID: PMC11933772 DOI: 10.2196/59792] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 08/08/2024] [Accepted: 11/15/2024] [Indexed: 03/28/2025] Open
Abstract
BACKGROUND Trajectory modeling is a long-standing challenge in the application of computational methods to health care. In the age of big data, traditional statistical and machine learning methods do not achieve satisfactory results as they often fail to capture the complex underlying distributions of multimodal health data and long-term dependencies throughout medical histories. Recent advances in generative artificial intelligence (AI) have provided powerful tools to represent complex distributions and patterns with minimal underlying assumptions, with major impact in fields such as finance and environmental sciences, prompting researchers to apply these methods for disease modeling in health care. OBJECTIVE While AI methods have proven powerful, their application in clinical practice remains limited due to their highly complex nature. The proliferation of AI algorithms also poses a significant challenge for nondevelopers to track and incorporate these advances into clinical research and application. In this paper, we introduce basic concepts in generative AI and discuss current algorithms and how they can be applied to health care for practitioners with little background in computer science. METHODS We surveyed peer-reviewed papers on generative AI models with specific applications to time-series health data. Our search included single- and multimodal generative AI models that operated over structured and unstructured data, physiological waveforms, medical imaging, and multi-omics data. We introduce current generative AI methods, review their applications, and discuss their limitations and future directions in each data modality. RESULTS We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines and reviewed 155 articles on generative AI applications to time-series health care data across modalities. Furthermore, we offer a systematic framework for clinicians to easily identify suitable AI methods for their data and task at hand. CONCLUSIONS We reviewed and critiqued existing applications of generative AI to time-series health data with the aim of bridging the gap between computational methods and clinical application. We also identified the shortcomings of existing approaches and highlighted recent advances in generative AI that represent promising directions for health care modeling.
Collapse
Affiliation(s)
- Rosemary He
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Varuni Sarwal
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Xinru Qiu
- Division of Biomedical Sciences, School of Medicine, University of California Riverside, Riverside, CA, United States
| | - Yongwen Zhuang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Le Zhang
- Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA, United States
| | - Yue Liu
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, United States
| | - Jeffrey Chiang
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
14
|
Banerjee A, Bogetti A, Bahar I. Accurate Identification and Mechanistic Evaluation of Pathogenic Missense Variants with Rhapsody-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.17.638727. [PMID: 40027614 PMCID: PMC11870481 DOI: 10.1101/2025.02.17.638727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication, and those distinguished by pronounced fluctuations in the high frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, New York 11794, USA
| | - Anthony Bogetti
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, New York 11794, USA
| | - Ivet Bahar
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, New York 11794, USA
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, New York 11794, USA
| |
Collapse
|
15
|
Yu H, He G, Wang W, Qin S, Wang Y, Bai M, Shu K, Pu D. A graph neural network approach for accurate prediction of pathogenicity in multi-type variants. Brief Bioinform 2025; 26:bbaf151. [PMID: 40251830 PMCID: PMC12008122 DOI: 10.1093/bib/bbaf151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/05/2025] [Accepted: 03/19/2025] [Indexed: 04/21/2025] Open
Abstract
Accurate prediction of pathogenic variants in human disease-associated genes would have a profound effect on clinical decision-making; however, it remains a significant challenge due to the overwhelming number of these variants. We propose graph neural network for multimodal annotation-based pathogenicity prediction (GNN-MAP), a novel deep learning framework that effectively integrates multimodal annotations and similarity relationships among variants to predict the pathogenicity of multi-type variants. Trained on the ClinVar dataset, GNN-MAP exhibits superior predictive performance in internal validation and orthogonal test datasets, accurately predicting variant pathogenicity. Notably, GNN-MAP enables accurate prediction of the pathogenicity of rare variants and highly imbalanced datasets. Furthermore, it achieves high performance in the pathogenicity prediction of inherited retinal disease-specific variants, highlighting its effectiveness in disease-specific variant prediction. These findings suggest that the robust capability of GNN-MAP to predict pathogenicity across multiple variant types and datasets holds significant potential for applications in research and clinical settings.
Collapse
Affiliation(s)
- Hongtao Yu
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Guojing He
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Wei Wang
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Senbiao Qin
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Yu Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| | - Dan Pu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan'an District, Chongqing 400065, China
| |
Collapse
|
16
|
Chen Q, Quan L, Cao L, Zhang B, Zhang Z, Peng L, Wang J, Jiang Y, Nie L, Li G, Wu T, Lyu Q. DS-MVP: identifying disease-specific pathogenicity of missense variants by pre-training representation. Brief Bioinform 2025; 26:bbaf119. [PMID: 40127180 PMCID: PMC11932084 DOI: 10.1093/bib/bbaf119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 01/26/2025] [Accepted: 03/01/2025] [Indexed: 03/26/2025] Open
Abstract
Accurately predicting the pathogenicity of missense variants is crucial for improving disease diagnosis and advancing clinical research. However, existing computational methods primarily focus on general pathogenicity predictions, overlooking assessments of disease-specific conditions. In this study, we propose DS-MVP, a method capable of predicting disease-specific pathogenicity of missense variants in human genomes. DS-MVP first leverages a deep learning model pre-trained on a large general pathogenicity dataset to learn rich representation of missense variants. It then fine-tunes these representations with an XGBoost model on smaller datasets for specific diseases. We evaluated the learned representation by testing it on multiple binary pathogenicity datasets and gene-level statistics, demonstrating that DS-MVP outperforms existing state-of-the-art methods, such as MetaRNN and AlphaMissense. Additionally, DS-MVP excels in multi-label and multi-class classification, effectively classifying disease-specific pathogenic missense variants based on disease conditions. It further enhances predictions by fine-tuning the pre-trained model on disease-specific datasets. Finally, we analyzed the contributions of the pre-trained model and various feature types, with gene description corpus features from large language model and genetic feature fusion contributing the most. These results underscore that DS-MVP represents a broader perspective on pathogenicity prediction and holds potential as an effective tool for disease diagnosis.
Collapse
Affiliation(s)
- Qiufeng Chen
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Lijun Quan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Lexin Cao
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Bei Zhang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Zhijun Zhang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Liangchen Peng
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Junkai Wang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Yelu Jiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Liangpeng Nie
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Geng Li
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Tingfang Wu
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Qiang Lyu
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| |
Collapse
|
17
|
Caron O. [Prophylactic surgery and genetic counselling: What impact of the artificial intelligence?]. Bull Cancer 2025; 112:241-250. [PMID: 40049793 DOI: 10.1016/j.bulcan.2024.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/05/2024] [Accepted: 07/10/2024] [Indexed: 05/13/2025]
Abstract
In the area of cancer predisposition, certain situations may lead to the discussion of prophylactic surgery. This is rarely strictly recommended and depends on the patient's choice. The advantages and disadvantages must be weighed up. The main advantage of prophylactic surgery is obviously risk reduction. At present, this risk is assessed on an individual basis using "classical" instruments. Artificial intelligence is expected to improve the selection of useful genetic information, the classification of variants of unknown significance, the combination of more comprehensive analysis results and the ability to associate them with non-genetic features. Artificial intelligence could also help to make genetic testing more accessible to people, or even contribute to direct patient information. This last point is likely to require considerable vigilance.
Collapse
Affiliation(s)
- Olivier Caron
- Département de médecine oncologique, Gustave-Roussy, 94805 Villejuif, France.
| |
Collapse
|
18
|
Ding M, Chen K, Yang Y, Zhao H. Prioritizing genomic variants pathogenicity via DNA, RNA, and protein-level features based on extreme gradient boosting. Hum Genet 2025; 144:253-263. [PMID: 38575818 DOI: 10.1007/s00439-024-02667-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 03/05/2024] [Indexed: 04/06/2024]
Abstract
Genetic diseases are mostly implicated with genetic variants, including missense, synonymous, non-sense, and copy number variants. These different kinds of variants are indicated to affect phenotypes in various ways from previous studies. It remains essential but challenging to understand the functional consequences of these genetic variants, especially the noncoding ones, due to the lack of corresponding annotations. While many computational methods have been proposed to identify the risk variants. Most of them have only curated DNA-level and protein-level annotations to predict the pathogenicity of the variants, and others have been restricted to missense variants exclusively. In this study, we have curated DNA-, RNA-, and protein-level features to discriminate disease-causing variants in both coding and noncoding regions, where the features of protein sequences and protein structures have been shown essential for analyzing missense variants in coding regions while the features related to RNA-splicing and RBP binding are significant for variants in noncoding regions and synonymous variants in coding regions. Through the integration of these features, we have formulated the Multi-level feature Genomic Variants Predictor (ML-GVP) using the gradient boosting tree. The method has been trained on more than 400,000 variants in the Sherloc-training set from the 6th critical assessment of genome interpretation with superior performance. The method is one of the two best-performing predictors on the blind test in the Sherloc assessment, and is further confirmed by another independent test dataset of de novo variants.
Collapse
Affiliation(s)
- Maolin Ding
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Ken Chen
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
| | - Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, 510000, China.
| |
Collapse
|
19
|
Phogat A, Krishnan SR, Pandey M, Gromiha MM. ZFP-CanPred: Predicting the effect of mutations in zinc-finger proteins in cancers using protein language models. Methods 2025; 235:55-63. [PMID: 39909391 DOI: 10.1016/j.ymeth.2025.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 01/21/2025] [Accepted: 01/27/2025] [Indexed: 02/07/2025] Open
Abstract
Zinc-finger proteins (ZNFs) constitute the largest family of transcription factors and play crucial roles in various cellular processes. Missense mutations in ZNFs significantly alter protein-DNA interactions, potentially leading to the development of various types of cancers. This study presents ZFP-CanPred, a novel deep learning-based model for predicting cancer-associated driver mutations in ZNFs. The representations derived from protein language models (PLMs) from the structural neighbourhood of mutated sites were utilized to train ZFP-CanPred for differentiating between cancer-causing and neutral mutations. ZFP-CanPred, achieved a superior performance with an accuracy of 0.72, F1-score of 0.79, and area under the Receiver Operating Characteristics (ROC) Curve (AUC) of 0.74, on an independent test set. In a comparative analysis against 11 existing prediction tools using a curated dataset of 331 mutations, ZFP-CanPred demonstrated the highest AU-ROC of 0.74, outperforming both generic and cancer-specific methods. The model's balanced performance across specificity and sensitivity addresses a significant limitation of current methodologies. The source code and other related files are available on GitHub at https://github.com/amitphogat/ZFP-CanPred.git. We envisage that the present study contributes to understand the oncogenic processes and developing targeted therapeutic strategies.
Collapse
Affiliation(s)
- Amit Phogat
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036 India
| | - Sowmya Ramaswamy Krishnan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036 India
| | - Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036 India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036 India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501 Japan.
| |
Collapse
|
20
|
Rastogi R, Chung R, Li S, Li C, Lee K, Woo J, Kim DW, Keum C, Babbi G, Martelli PL, Savojardo C, Casadio R, Chennen K, Weber T, Poch O, Ancien F, Cia G, Pucci F, Raimondi D, Vranken W, Rooman M, Marquet C, Olenyi T, Rost B, Andreoletti G, Kamandula A, Peng Y, Bakolitsa C, Mort M, Cooper DN, Bergquist T, Pejaver V, Liu X, Radivojac P, Brenner SE, Ioannidis NM. Critical assessment of missense variant effect predictors on disease-relevant variant data. Hum Genet 2025; 144:281-293. [PMID: 40113603 PMCID: PMC11976771 DOI: 10.1007/s00439-025-02732-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 02/07/2025] [Indexed: 03/22/2025]
Abstract
Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.
Collapse
Affiliation(s)
- Ruchir Rastogi
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
| | - Ryan Chung
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Sindy Li
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Chang Li
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | | | | | | | | | - Giulia Babbi
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | | | | | - François Ancien
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Gabriel Cia
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Daniele Raimondi
- ESAT-STADIUS, KU Leuven, Leuven, Belgium
- Institut de Génétique Moléculaire de Montpellier, Université de Montpellier, Montpellier, France
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Céline Marquet
- Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Munich, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Munich, Germany
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Sage Bionetworks, Seattle, WA, USA
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - Timothy Bergquist
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Steven E Brenner
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA.
| | - Nilah M Ioannidis
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
21
|
Jeong R, Bulyk ML. Meta-analysis reveals transcription factors and DNA binding domain variants associated with congenital heart defect and orofacial cleft. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.01.30.25321274. [PMID: 39974057 PMCID: PMC11838631 DOI: 10.1101/2025.01.30.25321274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Many structural birth defect patients lack genetic diagnoses because there are many disease genes as yet to be discovered. We applied a gene burden test incorporating de novo predicted-loss-of-function (pLoF) and likely damaging missense variants together with inherited pLoF variants to a collection of congenital heart defect (CHD) and orofacial cleft (OC) parent-offspring trio cohorts (n = 3,835 and 1,844, respectively). We identified 17 novel candidate CHD genes and 10 novel candidate OC genes, of which many were known developmental disorder genes. Shorter genes were more powered in a "de novo only" analysis as compared to analysis including inherited pLoF variants. TFs were enriched among the significant genes; 14 and 8 transcription factor (TF) genes showed significant variant burden for CHD and OC, respectively. In total, 30 affected children had a de novo missense variant in a DNA binding domain of a known CHD, OC, and other developmental disorder TF genes. Our results suggest candidate pathogenic variants in CHD and OC and their potentially pleiotropic effects in other developmental disorders.
Collapse
Affiliation(s)
- Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
22
|
Gromiha MM, Pandey M, Kulandaisamy A, Sharma D, Ridha F. Progress on the development of prediction tools for detecting disease causing mutations in proteins. Comput Biol Med 2025; 185:109510. [PMID: 39637461 DOI: 10.1016/j.compbiomed.2024.109510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 11/27/2024] [Accepted: 11/29/2024] [Indexed: 12/07/2024]
Abstract
Proteins are involved in a variety of functions in living organisms. The mutation of amino acid residues in a protein alters its structure, stability, binding, and function, with some mutations leading to diseases. Understanding the influence of mutations on protein structure and function help to gain deep insights on the molecular mechanism of diseases and devising therapeutic strategies. Hence, several generic and disease-specific methods have been proposed to reveal pathogenic effects on mutations. In this review, we focus on the development of prediction methods for identifying disease causing mutations in proteins. We briefly outline the existing databases for disease-causing mutations, followed by a discussion on sequence- and structure-based features used for prediction. Further, we discuss computational tools based on machine learning, deep learning and large language models for detecting disease-causing mutations. Specifically, we emphasize the advances in predicting hotspots and mutations for targets involved in cancer, neurodegenerative and infectious diseases as well as in membrane proteins. The computational resources including databases and algorithms understanding/predicting the effect of mutations will be listed. Moreover, limitations of existing methods and possible improvements will be discussed.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| | - Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - A Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Divya Sharma
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Fathima Ridha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
23
|
Zhang Y, Leung AK, Kang JJ, Sun Y, Wu G, Li L, Sun J, Cheng L, Qiu T, Zhang J, Wierbowski SD, Gupta S, Booth JG, Yu H. A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology. Nat Commun 2025; 16:975. [PMID: 39856048 PMCID: PMC11760531 DOI: 10.1038/s41467-024-54176-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 11/04/2024] [Indexed: 01/27/2025] Open
Abstract
A major goal of cancer biology is to understand the mechanisms driven by somatically acquired mutations. Two distinct methodologies-one analyzing mutation clustering within protein sequences and 3D structures, the other leveraging protein-protein interaction network topology-offer complementary strengths. We present NetFlow3D, a unified, end-to-end 3D structurally-informed protein interaction network propagation framework that maps the multiscale mechanistic effects of mutations. Built upon the Human Protein Structurome, which incorporates the 3D structures of every protein and the binding interfaces of all known protein interactions, NetFlow3D integrates atomic, residue, protein and network-level information: It clusters mutations on 3D protein structures to identify driver mutations and propagates their impacts anisotropically across the protein interaction network, guided by the involved interaction interfaces, to reveal systems-level impacts. Applied to 33 cancer types, NetFlow3D identifies 2 times more 3D clusters and incorporates 8 times more proteins in significantly interconnected network modules compared to traditional methods.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, 14853, NY, USA
| | - Alden K Leung
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Yu Sun
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Guanxi Wu
- College of Agriculture and Life Sciences, Cornell University, Ithaca, 14853, NY, USA
| | - Le Li
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Jiayang Sun
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Lily Cheng
- Department of Science and Technology Studies, Cornell University, Ithaca, 14853, NY, USA
| | - Tian Qiu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, 14853, NY, USA
| | - Junke Zhang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Shayne D Wierbowski
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Shagun Gupta
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - James G Booth
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Department of Statistics and Data Science, Cornell University, Ithaca, 14853, NY, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA.
| |
Collapse
|
24
|
Das S, Patel V, Chakravarty S, Ghosh A, Mukhopadhyay A, Biswas NK. An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer. BioData Min 2025; 18:7. [PMID: 39833905 PMCID: PMC11744934 DOI: 10.1186/s13040-024-00420-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 12/26/2024] [Indexed: 01/22/2025] Open
Abstract
BACKGROUND AND OBJECTIVE Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations. This study aims to develop an ensemble machine learning approach to evaluate the performance (rank) of pathogenic and conservation scoring algorithms (PCSAs) based on their ability to distinguish pathogenic driver mutations from benign passenger (non-driver) mutations in head and neck squamous cell carcinoma (HNSC). METHODS The study used a dataset from 502 HNSC patients, classifying mutations based on 299 known high-confidence cancer driver genes. Missense somatic mutations in driver genes were treated as driver mutations, while non-driver mutations were randomly selected from other genes. Each mutation was annotated with 41 PCSAs. Three machine learning algorithms-logistic regression, random forest, and support vector machine-along with recursive feature elimination, were used to rank these PCSAs. The final ranking of the PCSAs was determined using rank-average-sort and rank-sum-sort methods. RESULTS The random forest algorithm emerged as the top performer among the three tested ML algorithms, with an AUC-ROC of 0.89, compared to 0.83 for the other two, in distinguishing pathogenic driver mutations from benign passenger mutations using all 41 PCSAs. The top 11 PCSAs were selected based on the first quintile cut-off from the final rank-sum distribution. Classifiers built using these top 11 PCSAs (DEOGEN2, Integrated_fitCons, MVP, etc.) demonstrated significantly higher performance (p-value < 2.22e-16) compared to those using the remaining 30 PCSAs across all three ML algorithms, in separating pathogenic driver from benign passenger mutations. The top PCSAs demonstrated strong performance on a validation cohort including independent HNSC and other cancer types: breast, lung, and colorectal - reflecting its consistency, robustness and generalizability. CONCLUSIONS The ensemble machine learning approach effectively evaluates the performance of PCSAs based on their ability to differentiate pathogenic drivers from benign passenger mutations in HNSC and other cancer types. Notably, some well-known PCSAs performed poorly, underscoring the importance of data-driven selection over relying solely on popularity.
Collapse
Affiliation(s)
- Subrata Das
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Vatsal Patel
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Shouvik Chakravarty
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Arnab Ghosh
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, 741235, India.
| | - Nidhan K Biswas
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India.
| |
Collapse
|
25
|
Abuelrub A, Erol I, Nalbant Bingol N, Ozemri Sag S, Temel SG, Durdağı S. Computational Analysis of CC2D1A Missense Mutations: Insight into Protein Structure and Interaction Dynamics. ACS Chem Neurosci 2025. [PMID: 39791913 DOI: 10.1021/acschemneuro.4c00570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025] Open
Abstract
CC2D1A is implicated in a range of conditions, including autism spectrum disorder, intellectual disability, seizures, autosomal recessive nonsyndromic intellectual disability, heterotaxy, and ciliary dysfunction. In order to understand the molecular mechanisms underlying these conditions, we focused on the structural and dynamic activity consequences of mutations within this gene. In this study, whole exome sequencing identified the c.1552G > A (GLU518LYS) missense mutation in the CC2D1A in an 18-year-old male, linking it to intellectual disability and autism. In addition to the GLU518LYS mutation, we conducted a comprehensive analysis of other predefined missense mutations (i.e., PRO192LEU, GLN506ARG, PRO532LEU, GLY781VAL, and GLY781GLU) found within the CC2D1A. Utilizing all-atom molecular dynamics (MD) simulations and neighborhood interaction analyses, we delve into the impact of these mutations on protein structure and function at an atomic level, aiming to shed light on their contribution to the pathogenesis of related diseases. The results suggest that GLU518LYS, GLY781VAL, and GLY781GLU mutations did not significantly alter overall global protein structure compared to the wild type, while PRO192LEU, GLN506ARG, and PRO532LEU exhibited slightly higher protein root-mean-square deviation (RMSD) values, which may indicate potential impacts on whole protein stability. Moreover, neighborhood interaction analysis indicated that ASP85 emerges as a unique interaction partner specifically associated with the GLU518LYS mutation, whereas LYS75, which interacts with the ASP85 in the mutated form, is absent in the wild type. This alteration signifies a crucial reconfiguration in the local interaction network at the site of the mutation.
Collapse
Affiliation(s)
- Anwar Abuelrub
- Laboratory for Innovative Drugs (Lab4IND), Computational Drug Design Center (HITMER), Bahçeşehir University, 34734 İstanbul, Türkiye
- Computational Biology and Molecular Simulations Laboratory, Department of Biophysics, School of Medicine, Bahçeşehir University, 34734 Istanbul, Türkiye
- Graduate School of Natural and Applied Sciences, Artificial Intelligence Program, Bahçeşehir University, 34734 Istanbul, Turkey
| | - Ismail Erol
- Laboratory for Innovative Drugs (Lab4IND), Computational Drug Design Center (HITMER), Bahçeşehir University, 34734 İstanbul, Türkiye
- Department of Analytical Chemistry, School of Pharmacy, Bahçeşehir University, 34351 İstanbul, Türkiye
| | - Nurdeniz Nalbant Bingol
- Department of Translational Medicine, Institute of Health Sciences, Bursa Uludag University, 16059 Bursa, Türkiye
| | - Sebnem Ozemri Sag
- Department of Medical Genetics, Faculty of Medicine, Bursa Uludag University, 16059 Bursa, Türkiye
| | - Sehime G Temel
- Department of Translational Medicine, Institute of Health Sciences, Bursa Uludag University, 16059 Bursa, Türkiye
- Department of Medical Genetics, Faculty of Medicine, Bursa Uludag University, 16059 Bursa, Türkiye
- Department of Histology and Embryology, Faculty of Medicine, Bursa Uludag University, 16059 Bursa, Türkiye
| | - Serdar Durdağı
- Laboratory for Innovative Drugs (Lab4IND), Computational Drug Design Center (HITMER), Bahçeşehir University, 34734 İstanbul, Türkiye
- Computational Biology and Molecular Simulations Laboratory, Department of Biophysics, School of Medicine, Bahçeşehir University, 34734 Istanbul, Türkiye
- Molecular Therapy Laboratory, Department of Pharmaceutical Chemistry, School of Pharmacy, Bahçeşehir University, 34351 İstanbul, Türkiye
| |
Collapse
|
26
|
Zhao W, Tao Y, Xiong J, Liu L, Wang Z, Shao C, Shang L, Hu Y, Xu Y, Su Y, Yu J, Feng T, Xie J, Xu H, Zhang Z, Peng J, Wu J, Zhang Y, Zhu S, Xia K, Tang B, Zhao G, Li J, Li B. GoFCards: an integrated database and analytic platform for gain of function variants in humans. Nucleic Acids Res 2025; 53:D976-D988. [PMID: 39578693 PMCID: PMC11701611 DOI: 10.1093/nar/gkae1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/20/2024] [Accepted: 10/28/2024] [Indexed: 11/24/2024] Open
Abstract
Gain-of-function (GOF) variants, which introduce new or amplify protein functions, are essential for understanding disease mechanisms. Despite advances in genomics and functional research, identifying and analyzing pathogenic GOF variants remains challenging owing to fragmented data and database limitations, underscoring the difficulty in accessing critical genetic information. To address this challenge, we manually reviewed the literature, pinpointing 3089 single-nucleotide variants and 72 insertions and deletions in 579 genes associated with 1299 diseases from 2069 studies, and integrated these with the 3.5 million predicted GOF variants. Our approach is complemented by a proprietary scoring system that prioritizes GOF variants on the basis of the evidence supporting their GOF effects and provides predictive scores for variants that lack existing documentation. We then developed a database named GoFCards for general geneticists and clinicians to easily obtain GOF variants in humans (http://www.genemed.tech/gofcards). This database also contains data from >150 sources and offers comprehensive variant-level and gene-level annotations, with the aim of providing users with convenient access to detailed and relevant genetic information. Furthermore, GoFCards empowers users with limited bioinformatic skills to analyze and annotate genetic data, and prioritize GOF variants. GoFCards offers an efficient platform for interpreting GOF variants and thereby advancing genetic research.
Collapse
Affiliation(s)
- Wenjing Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Medical Genetics, NHC Key Laboratory of Healthy Birth and Birth Defect Prevention in Western China, The First People's Hospital of Yunnan Province, No. 157 Jinbi Road, Xishan District, Kunming, Yunnan 650000, China
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Youfu Tao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Xiong
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Lei Liu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zhongqing Wang
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Chuhan Shao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Ling Shang
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yue Hu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yishu Xu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yingluo Su
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiahui Yu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Tianyi Feng
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Junyi Xie
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Huijuan Xu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zijun Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Peng
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jianbin Wu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yuchang Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Shaobo Zhu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Kun Xia
- MOE Key Laboratory of Pediatric Rare Diseases & Hunan Key Laboratory of Medical Genetics, Central South University, No. 110 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Neurology & Multi-omics Research Center for Brain Disorders, The First Affiliated Hospital University of South China, 69 Chuan Shan Road, Shi Gu District, Hengyang, Hunan 421000, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| |
Collapse
|
27
|
Katsonis P, Lichtarge O. Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects. Nat Commun 2025; 16:159. [PMID: 39746940 PMCID: PMC11696468 DOI: 10.1038/s41467-024-55066-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 11/27/2024] [Indexed: 01/04/2025] Open
Abstract
Computational methods for estimating missense variant impact suffer from inconsistent performance across genes, which poses a major challenge for their reliable use in clinical practice. While ensemble scores leverage multiple prediction methods to enhance consistency, the overrepresentation of certain genes in the training data can bias their outcomes. To address this critical limitation, we propose a gene-specific ensemble framework trained on reference computational annotations rather than on clinical or experimental data. Accordingly, we generate Meta-EA ensemble scores that achieve comparable performance to the top individual predicting method for each gene set. Incorporating the effects of splicing and the allele frequency of human polymorphisms further enhances the performance of Meta-EA, achieving an area under the receiver operating characteristic curve of 0.97 for both gene-balanced and imbalanced clinical assessments. In conclusion, this work leverages the wealth of existing variant impact prediction approaches to generate improved estimations for clinical interpretation.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
28
|
Hong J, Lee D, Hwang A, Kim T, Ryu HY, Choi J. Rare disease genomics and precision medicine. Genomics Inform 2024; 22:28. [PMID: 39627904 PMCID: PMC11616305 DOI: 10.1186/s44342-024-00032-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 11/16/2024] [Indexed: 12/06/2024] Open
Abstract
Rare diseases, though individually uncommon, collectively affect millions worldwide. Genomic technologies and big data analytics have revolutionized diagnosing and understanding these conditions. This review explores the role of genomics in rare disease research, the impact of large consortium initiatives, advancements in extensive data analysis, the integration of artificial intelligence (AI) and machine learning (ML), and the therapeutic implications in precision medicine. We also discuss the challenges of data sharing and privacy concerns, emphasizing the need for collaborative efforts and secure data practices to advance rare disease research.
Collapse
Affiliation(s)
- Juhyeon Hong
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Dajun Lee
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Ayoung Hwang
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Taekeun Kim
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Hong-Yeoul Ryu
- School of Life Sciences, BK21 FOUR KNU Creative BioResearch Group, College of Natural Sciences, Kyungpook National University, Daegu, 41566, Republic of Korea
| | - Jungmin Choi
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea.
| |
Collapse
|
29
|
Ma K, Huang S, Ng KK, Lake NJ, Joseph S, Xu J, Lek A, Ge L, Woodman KG, Koczwara KE, Cohen J, Ho V, O'Connor CL, Brindley MA, Campbell KP, Lek M. Saturation mutagenesis-reinforced functional assays for disease-related genes. Cell 2024; 187:6707-6724.e22. [PMID: 39326416 PMCID: PMC11568926 DOI: 10.1016/j.cell.2024.08.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 07/29/2024] [Accepted: 08/23/2024] [Indexed: 09/28/2024]
Abstract
Interpretation of disease-causing genetic variants remains a challenge in human genetics. Current costs and complexity of deep mutational scanning methods are obstacles for achieving genome-wide resolution of variants in disease-related genes. Our framework, saturation mutagenesis-reinforced functional assays (SMuRF), offers simple and cost-effective saturation mutagenesis paired with streamlined functional assays to enhance the interpretation of unresolved variants. Applying SMuRF to neuromuscular disease genes FKRP and LARGE1, we generated functional scores for all possible coding single-nucleotide variants, which aid in resolving clinically reported variants of uncertain significance. SMuRF also demonstrates utility in predicting disease severity, resolving critical structural regions, and providing training datasets for the development of computational predictors. Overall, our approach enables variant-to-function insights for disease genes in a cost-effective manner that can be broadly implemented by standard research laboratories.
Collapse
Affiliation(s)
- Kaiyue Ma
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
| | - Shushu Huang
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Kenneth K Ng
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Nicole J Lake
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Soumya Joseph
- Howard Hughes Medical Institute, Senator Paul D. Wellstone Muscular Dystrophy Specialized Research Center, Department of Molecular Physiology and Biophysics and Department of Neurology, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
| | - Jenny Xu
- Yale University, New Haven, CT, USA
| | - Angela Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Muscular Dystrophy Association, Chicago, IL, USA
| | - Lin Ge
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Department of Neurology, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Keryn G Woodman
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | - Justin Cohen
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Vincent Ho
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | - Melinda A Brindley
- Department of Infectious Diseases, Department of Population Health, University of Georgia, Athens, GA, USA
| | - Kevin P Campbell
- Howard Hughes Medical Institute, Senator Paul D. Wellstone Muscular Dystrophy Specialized Research Center, Department of Molecular Physiology and Biophysics and Department of Neurology, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
30
|
Jia R, He Z, Wang C, Guo X, Li F. MetalPrognosis: A Biological Language Model-Based Approach for Disease-Associated Mutations in Metal-Binding Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2340-2348. [PMID: 39320992 DOI: 10.1109/tcbb.2024.3467093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
Protein-metal ion interactions play a central role in the onset of numerous diseases. When amino acid changes lead to missense mutations in metal-binding sites, the disrupted interaction with metal ions can compromise protein function, potentially causing severe human ailments. Identifying these disease-associated mutation sites within metal-binding regions is paramount for understanding protein function and fostering innovative drug development. While some computational methods aim to tackle this challenge, they often fall short in accuracy, commonly due to manual feature extraction and the absence of structural data. We introduce MetalPrognosis, an innovative, alignment-free solution that predicts disease-associated mutations within metal-binding sites of metalloproteins with heightened precision. Rather than relying on manual feature extraction, MetalPrognosis employs sliding window sequences as input, extracting deep semantic insights from pre-trained protein language models. These insights are then incorporated into a convolutional neural network, facilitating the derivation of intricate features. Comparative evaluations show MetalPrognosis outperforms leading methodologies like MCCNN and M-Ionic across various metalloprotein test sets. Furthermore, an ablation study reiterates the effectiveness of our model architecture.
Collapse
|
31
|
González-Padilla D, Camara MD, Lauschke VM, Zhou Y. Population-scale variability of the human UDP-glycosyltransferase gene family. J Genet Genomics 2024; 51:1228-1236. [PMID: 38969258 DOI: 10.1016/j.jgg.2024.06.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 06/10/2024] [Accepted: 06/26/2024] [Indexed: 07/07/2024]
Abstract
Human UDP-glycosyltransferases (UGTs) are responsible for the glycosylation of a wide variety of endogenous substrates and commonly prescribed drugs. Different genetic polymorphisms in UGT genes are implicated in interindividual differences in drug response and cancer risk. However, the genetic complexity beyond these variants has not been comprehensively assessed. We here leveraged whole-exome and whole-genome sequencing data from 141,456 unrelated individuals across 7 major human populations to provide a comprehensive profile of genetic variability across the human UGT gene family. Overall, 9666 exonic variants were observed, of which 98.9% were rare. To interpret the functional impact of UGT missense variants, we developed a gene family-specific variant effect predictor. This algorithm identified a total of 1208 deleterious variants, most of which were found in African and South Asian populations. Structural analysis corroborated the predicted effects for multiple variations in substrate binding sites. Combined, our analyses provide a systematic overview of UGT variability, which can yield insights into interindividual differences in phase 2 metabolism and facilitate the translation of sequencing data into personalized predictions of UGT substrate disposition.
Collapse
Affiliation(s)
| | - Mahamadou D Camara
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden; Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany; University of Tübingen, Tübingen, Germany.
| | - Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden.
| |
Collapse
|
32
|
Lim EW, Fallon RJ, Bates C, Ideguchi Y, Nagasaki T, Handzlik MK, Joulia E, Bonelli R, Green CR, Ansell BRE, Kitano M, Polis I, Roberts AJ, Furuya S, Allikmets R, Wallace M, Friedlander M, Metallo CM, Gantner ML. Serine and glycine physiology reversibly modulate retinal and peripheral nerve function. Cell Metab 2024; 36:2315-2328.e6. [PMID: 39191258 DOI: 10.1016/j.cmet.2024.07.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/11/2024] [Accepted: 07/30/2024] [Indexed: 08/29/2024]
Abstract
Metabolic homeostasis is maintained by redundant pathways to ensure adequate nutrient supply during fasting and other stresses. These pathways are regulated locally in tissues and systemically via the liver, kidney, and circulation. Here, we characterize how serine, glycine, and one-carbon (SGOC) metabolism fluxes across the eye, liver, and kidney sustain retinal amino acid levels and function. Individuals with macular telangiectasia (MacTel), an age-related retinal disease with reduced circulating serine and glycine, carrying deleterious alleles in SGOC metabolic enzymes exhibit an exaggerated reduction in circulating serine. A Phgdh+/- mouse model of this haploinsufficiency experiences accelerated retinal defects upon dietary serine/glycine restriction, highlighting how otherwise silent haploinsufficiencies can impact retinal health. We demonstrate that serine-associated retinopathy and peripheral neuropathy are reversible, as both are restored in mice upon serine supplementation. These data provide molecular insights into the genetic and metabolic drivers of neuro-retinal dysfunction while highlighting therapeutic opportunities to ameliorate this pathogenesis.
Collapse
Affiliation(s)
- Esther W Lim
- Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Regis J Fallon
- Lowy Medical Research Institute, La Jolla, CA 92037, USA
| | - Caleb Bates
- Lowy Medical Research Institute, La Jolla, CA 92037, USA
| | | | | | - Michal K Handzlik
- Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Emeline Joulia
- Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Roberto Bonelli
- Lowy Medical Research Institute, La Jolla, CA 92037, USA; Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
| | - Courtney R Green
- Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Brendan R E Ansell
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC 3052, Australia
| | - Maki Kitano
- The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Ilham Polis
- The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | - Shigeki Furuya
- Department of Bioscience and Biotechnology, Kyushu University, Fukuoka 812-0053, Japan
| | | | - Martina Wallace
- School of Agriculture and Food Science, University College Dublin, Dublin 4, Ireland
| | - Martin Friedlander
- Lowy Medical Research Institute, La Jolla, CA 92037, USA; The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Christian M Metallo
- Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
| | | |
Collapse
|
33
|
Ahmad RM, Ali BR, Al-Jasmi F, Al Dhaheri N, Al Turki S, Kizhakkedath P, Mohamad MS. AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes. Hum Genomics 2024; 18:99. [PMID: 39256852 PMCID: PMC11389290 DOI: 10.1186/s40246-024-00667-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 08/22/2024] [Indexed: 09/12/2024] Open
Abstract
Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.
Collapse
Affiliation(s)
- Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Bassam R Ali
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Fatma Al-Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Noura Al Dhaheri
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Saeed Al Turki
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Praseetha Kizhakkedath
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates.
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.
| |
Collapse
|
34
|
Wang L, Sun H, Yue Z, Xia J, Li X. CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations. PeerJ 2024; 12:e17991. [PMID: 39253604 PMCID: PMC11382650 DOI: 10.7717/peerj.17991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open
Abstract
Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.
Collapse
Affiliation(s)
- Lihua Wang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
- School of Information Engineering, Huangshan University, Huangshan, Anhui, China
| | - Haiyang Sun
- State Key Laboratory of Medicinal Chemical Biology, NanKai University, Tianjin, Tianjin, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Xiaoyan Li
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
35
|
Liu J, Chen Y, Huang K, Guan X. Enhancing Missense Variant Pathogenicity Prediction with MissenseNet: Integrating Structural Insights and ShuffleNet-Based Deep Learning Techniques. Biomolecules 2024; 14:1105. [PMID: 39334871 PMCID: PMC11429773 DOI: 10.3390/biom14091105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/17/2024] [Accepted: 07/22/2024] [Indexed: 09/30/2024] Open
Abstract
The classification of missense variant pathogenicity continues to pose significant challenges in human genetics, necessitating precise predictions of functional impacts for effective disease diagnosis and personalized treatment strategies. Traditional methods, often compromised by suboptimal feature selection and limited generalizability, are outpaced by the enhanced classification model, MissenseNet (Missense Classification Network). This model, advancing beyond standard predictive features, incorporates structural insights from AlphaFold2 protein predictions, thus optimizing structural data utilization. MissenseNet, built on the ShuffleNet architecture, incorporates an encoder-decoder framework and a Squeeze-and-Excitation (SE) module designed to adaptively adjust channel weights and enhance feature fusion and interaction. The model's efficacy in classifying pathogenicity has been validated through superior accuracy compared to conventional methods and by achieving the highest areas under the Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves (Area Under the Curve and Area Under the Precision-Recall Curve) in an independent test set, thus underscoring its superiority.
Collapse
Affiliation(s)
- Jing Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Yingying Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kai Huang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
- National Grain Industry (Urban Grain and Oil Security) Technology Innovation Center, Shanghai 200093, China
| | - Xiao Guan
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
- National Grain Industry (Urban Grain and Oil Security) Technology Innovation Center, Shanghai 200093, China
| |
Collapse
|
36
|
Zyluk A, Debniak T, Flicinski F, Rudnicka H. Inherited Variants in the COL11A, COL1A, COL5A1, COMP, GSTM1 Genes and the Risk of Carpal Tunnel Syndrome. HANDCHIR MIKROCHIR P 2024; 56:359-367. [PMID: 39333034 DOI: 10.1055/a-2375-3737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2024] Open
Abstract
The pathogenesis of most cases of carpal tunnel syndrome is not clearly defined. There are some aspects of the disease that suggest a potential effect of genetic predispositions. Mutations (variants) within the genes encoding various subtypes of collagen synthesis, oligomerisation in the endoplasmic reticulum and inactivation of reactive oxygen species may be involved in the development of carpal tunnel syndrome. The objective of this study was to determine the role of DNA alterations within the COL11A, COL1A, COL5A1, COMP and GSTM1 genes in the pathogenesis of carpal tunnel syndrome based on a Polish population. STUDY DESIGN In the discovery phase, a total of 96 patients with familial aggregation of CTS were genotyped using a Next Generation Sequencing panel in order to find possible mutations within the studied genes. The potential pathogenicity of the detected variants was investigated using the predictions of several in-silico algorithms and the TaqMan technology. In the association phase of the study, a group of 345 CTS patients and 1035 healthy controls were genotyped. RESULTS A total of 35 splice-site or exonic non-synonymous variants were detected by NGS. We did not identify any clearly pathogenic or likely pathogenic alternations. The 30 variants were identified as benign or likely benign. Five missense changes were predicted as VUS and selected for association study. The COL5A1 c.1595 C>T (p.Ala532Val) was detected in one out of 345 cases and three out of 1035 controls (P=1, OR=1); this indicates that the variant is a neutral alteration. Four remaining variants - c.2840 C>A, c.5395 G>A, c.1331 C>G, c.1590 C>A - were present in none out of the 345 CTS patients and none out of 1035 controls. CONCLUSION The main finding of this study was that there was no independent association between the variants of five examined genes and carpal tunnel syndrome. Four uncertain variants were identified that seem to be extremely rare in the Polish population.
Collapse
Affiliation(s)
- Andrzej Zyluk
- Department of General and Hand Surgery, Pomeranian Medical University, Szczecin, Poland
| | - Tadeusz Debniak
- Department of Genetics and Pathomorfology, Pomeranian Medical University, Szczecin, Poland
| | - Filip Flicinski
- Department of General and Hand Surgery, Pomeranian Medical University, Szczecin, Poland
| | - Helena Rudnicka
- Department of Genetics and Pathomorfology, Pomeranian Medical University, Szczecin, Poland
| |
Collapse
|
37
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
38
|
Zhang Y, Leung AK, Kang JJ, Sun Y, Wu G, Li L, Sun J, Cheng L, Qiu T, Zhang J, Wierbowski S, Gupta S, Booth J, Yu H. A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.06.531441. [PMID: 36945530 PMCID: PMC10028849 DOI: 10.1101/2023.03.06.531441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
A major goal of cancer biology is to understand the mechanisms underlying tumorigenesis driven by somatically acquired mutations. Two distinct types of computational methodologies have emerged: one focuses on analyzing clustering of mutations within protein sequences and 3D structures, while the other characterizes mutations by leveraging the topology of protein-protein interaction network. Their insights are largely non-overlapping, offering complementary strengths. Here, we established a unified, end-to-end 3D structurally-informed protein interaction network propagation framework, NetFlow3D, that systematically maps the multiscale mechanistic effects of somatic mutations in cancer. The establishment of NetFlow3D hinges upon the Human Protein Structurome, a comprehensive repository we compiled that incorporates the 3D structures of every single protein as well as the binding interfaces of all known protein interactions in humans. NetFlow3D leverages the Structurome to integrate information across atomic, residue, protein and network levels: It conducts 3D clustering of mutations across atomic and residue levels on protein structures to identify potential driver mutations. It then anisotropically propagates their impacts across the protein interaction network, with propagation guided by the specific 3D structural interfaces involved, to identify significantly interconnected network "modules", thereby uncovering key biological processes underlying disease etiology. Applied to 1,038,899 somatic protein-altering mutations in 9,946 TCGA tumors across 33 cancer types, NetFlow3D identified 1,4444 significant 3D clusters throughout the Human Protein Structurome, of which ~55% would not have been found if using only experimentally-determined structures. It then identified 26 significantly interconnected modules that encompass ~8-fold more proteins than applying standard network analyses. NetFlow3D and our pan-cancer results can be accessed from http://netflow3d.yulab.org.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University; Ithaca, 14853, USA
| | - Alden K. Leung
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Yu Sun
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Guanxi Wu
- College of Agriculture and Life Sciences, Cornell University; Ithaca, 14853, USA
| | - Le Li
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Jiayang Sun
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
| | - Lily Cheng
- Department of Science and Technology Studies, Cornell University; Ithaca, 14853, USA
| | - Tian Qiu
- School of Electrical and Computer Engineering, Cornell University; Ithaca, 14853, USA
| | - Junke Zhang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Shayne Wierbowski
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Shagun Gupta
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - James Booth
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Department of Statistics and Data Science, Cornell University; Ithaca, 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| |
Collapse
|
39
|
Tabet DR, Kuang D, Lancaster MC, Li R, Liu K, Weile J, Coté AG, Wu Y, Hegele RA, Roden DM, Roth FP. Benchmarking computational variant effect predictors by their ability to infer human traits. Genome Biol 2024; 25:172. [PMID: 38951922 PMCID: PMC11218265 DOI: 10.1186/s13059-024-03314-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 06/17/2024] [Indexed: 07/03/2024] Open
Abstract
BACKGROUND Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts. RESULTS AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation. CONCLUSION We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
Collapse
Affiliation(s)
- Daniel R Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Da Kuang
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Roujia Li
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Karen Liu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina G Coté
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Yingzhou Wu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Robert A Hegele
- Department of Medicine, Department of Biochemistry, Schulich School of Medicine and Dentistry, Robarts Research Institute, Western University, London, ON, Canada
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Centre, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada.
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
40
|
Bromberg Y, Prabakaran R, Kabir A, Shehu A. Variant Effect Prediction in the Age of Machine Learning. Cold Spring Harb Perspect Biol 2024; 16:a041467. [PMID: 38621825 PMCID: PMC11216171 DOI: 10.1101/cshperspect.a041467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein "sentences"? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biology, Emory University, Atlanta 30322, Georgia, USA
- Department of Computer Science, Emory University, Atlanta 30322, Georgia, USA
| | - R Prabakaran
- Department of Biology, Emory University, Atlanta 30322, Georgia, USA
| | - Anowarul Kabir
- Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA
| |
Collapse
|
41
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
42
|
Ma K, Huang S, Ng KK, Lake NJ, Joseph S, Xu J, Lek A, Ge L, Woodman KG, Koczwara KE, Cohen J, Ho V, O’Connor CL, Brindley MA, Campbell KP, Lek M. Deep Mutational Scanning in Disease-related Genes with Saturation Mutagenesis-Reinforced Functional Assays (SMuRF). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.12.548370. [PMID: 37873263 PMCID: PMC10592615 DOI: 10.1101/2023.07.12.548370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Interpretation of disease-causing genetic variants remains a challenge in human genetics. Current costs and complexity of deep mutational scanning methods hamper crowd-sourcing approaches toward genome-wide resolution of variants in disease-related genes. Our framework, Saturation Mutagenesis-Reinforced Functional assays (SMuRF), addresses these issues by offering simple and cost-effective saturation mutagenesis, as well as streamlining functional assays to enhance the interpretation of unresolved variants. Applying SMuRF to neuromuscular disease genes FKRP and LARGE1, we generated functional scores for all possible coding single nucleotide variants, which aid in resolving clinically reported variants of uncertain significance. SMuRF also demonstrates utility in predicting disease severity, resolving critical structural regions, and providing training datasets for the development of computational predictors. Our approach opens new directions for enabling variant-to-function insights for disease genes in a manner that is broadly useful for crowd-sourcing implementation across standard research laboratories.
Collapse
Affiliation(s)
- Kaiyue Ma
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Shushu Huang
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Equal second authors
| | - Kenneth K. Ng
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Equal second authors
| | - Nicole J. Lake
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Soumya Joseph
- Howard Hughes Medical Institute, Senator Paul D. Wellstone Muscular Dystrophy Specialized Research Center, Department of Molecular Physiology and Biophysics and Department of Neurology, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
| | - Jenny Xu
- Yale University, New Haven, CT, USA
| | - Angela Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Muscular Dystrophy Association, Chicago, IL, USA
| | - Lin Ge
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Neurology, National Center for Children’s Health, Beijing Children’s Hospital, Capital Medical University, Beijing, China
| | - Keryn G. Woodman
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | - Justin Cohen
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Vincent Ho
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | - Melinda A. Brindley
- Department of Infectious Diseases, Department of Population Health, University of Georgia, Athens, GA, USA
- Senior Authors
| | - Kevin P. Campbell
- Howard Hughes Medical Institute, Senator Paul D. Wellstone Muscular Dystrophy Specialized Research Center, Department of Molecular Physiology and Biophysics and Department of Neurology, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
- Senior Authors
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Senior Authors
- Lead Contact
| |
Collapse
|
43
|
Rastogi R, Chung R, Li S, Li C, Lee K, Woo J, Kim DW, Keum C, Babbi G, Martelli PL, Savojardo C, Casadio R, Chennen K, Weber T, Poch O, Ancien F, Cia G, Pucci F, Raimondi D, Vranken W, Rooman M, Marquet C, Olenyi T, Rost B, Andreoletti G, Kamandula A, Peng Y, Bakolitsa C, Mort M, Cooper DN, Bergquist T, Pejaver V, Liu X, Radivojac P, Brenner SE, Ioannidis NM. Critical assessment of missense variant effect predictors on disease-relevant variant data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.06.597828. [PMID: 38895200 PMCID: PMC11185644 DOI: 10.1101/2024.06.06.597828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.
Collapse
|
44
|
Zhou Y, Pirmann S, Lauschke VM. APF2: an improved ensemble method for pharmacogenomic variant effect prediction. THE PHARMACOGENOMICS JOURNAL 2024; 24:17. [PMID: 38802404 PMCID: PMC11129946 DOI: 10.1038/s41397-024-00338-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/26/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024]
Abstract
Lack of efficacy or adverse drug response are common phenomena in pharmacological therapy causing considerable morbidity and mortality. It is estimated that 20-30% of this variability in drug response stems from variations in genes encoding drug targets or factors involved in drug disposition. Leveraging such pharmacogenomic information for the preemptive identification of patients who would benefit from dose adjustments or alternative medications thus constitutes an important frontier of precision medicine. Computational methods can be used to predict the functional effects of variant of unknown significance. However, their performance on pharmacogenomic variant data has been lackluster. To overcome this limitation, we previously developed an ensemble classifier, termed APF, specifically designed for pharmacogenomic variant prediction. Here, we aimed to further improve predictions by leveraging recent key advances in the prediction of protein folding based on deep neural networks. Benchmarking of 28 variant effect predictors on 530 pharmacogenetic missense variants revealed that structural predictions using AlphaMissense were most specific, whereas APF exhibited the most balanced performance. We then developed a new tool, APF2, by optimizing algorithm parametrization of the top performing algorithms for pharmacogenomic variations and aggregating their predictions into a unified ensemble score. Importantly, APF2 provides quantitative variant effect estimates that correlate well with experimental results (R2 = 0.91, p = 0.003) and predicts the functional impact of pharmacogenomic variants with higher accuracy than previous methods, particularly for clinically relevant variations with actionable pharmacogenomic guidelines. We furthermore demonstrate better performance (92% accuracy) on an independent test set of 146 variants across 61 pharmacogenes not used for model training or validation. Application of APF2 to population-scale sequencing data from over 800,000 individuals revealed drastic ethnogeographic differences with important implications for pharmacotherapy. We thus think that APF2 holds the potential to improve the translation of genetic information into pharmacogenetic recommendations, thereby facilitating the use of Next-Generation Sequencing data for stratified medicine.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden
| | - Sebastian Pirmann
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden.
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany.
- University of Tübingen, Tübingen, Germany.
| |
Collapse
|
45
|
Al-Saei O, Malka S, Owen N, Aliyev E, Vempalli FR, Ocieczek P, Al-Khathlan B, Fakhro K, Moosajee M. Increasing the diagnostic yield of childhood glaucoma cases recruited into the 100,000 Genomes Project. BMC Genomics 2024; 25:484. [PMID: 38755526 PMCID: PMC11097485 DOI: 10.1186/s12864-024-10353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 04/25/2024] [Indexed: 05/18/2024] Open
Abstract
Childhood glaucoma (CG) encompasses a heterogeneous group of genetic eye disorders that is responsible for approximately 5% of childhood blindness worldwide. Understanding the molecular aetiology is key to improving diagnosis, prognosis and unlocking the potential for optimising clinical management. In this study, we investigated 86 CG cases from 78 unrelated families of diverse ethnic backgrounds, recruited into the Genomics England 100,000 Genomes Project (GE100KGP) rare disease cohort, to improve the genetic diagnostic yield. Using the Genomics England/Genomic Medicine Centres (GE/GMC) diagnostic pipeline, 13 unrelated families were solved (13/78, 17%). Further interrogation using an expanded gene panel yielded a molecular diagnosis in 7 more unrelated families (7/78, 9%). This analysis effectively raises the total number of solved CG families in the GE100KGP to 26% (20/78 families). Twenty-five percent (5/20) of the solved families had primary congenital glaucoma (PCG), while 75% (15/20) had secondary CG; 53% of this group had non-acquired ocular anomalies (including iris hypoplasia, megalocornea, ectopia pupillae, retinal dystrophy, and refractive errors) and 47% had non-acquired systemic diseases such as cardiac abnormalities, hearing impairment, and developmental delay. CYP1B1 was the most frequently implicated gene, accounting for 55% (11/20) of the solved families. We identified two novel likely pathogenic variants in the TEK gene, in addition to one novel pathogenic copy number variant (CNV) in FOXC1. Variants that passed undetected in the GE100KGP diagnostic pipeline were likely due to limitations of the tiering process, the use of smaller gene panels during analysis, and the prioritisation of coding SNVs and indels over larger structural variants, CNVs, and non-coding variants.
Collapse
Affiliation(s)
- Omayma Al-Saei
- Institute of Ophthalmology, University College London, London, EC1V 9EL, UK
- Department of Human Genetics, Sidra Medicine, PO Box 26999, Doha, Qatar
| | - Samantha Malka
- Moorfields Eye Hospital NHS Foundation Trust, London, EC1V 2PD, UK
| | - Nicholas Owen
- Institute of Ophthalmology, University College London, London, EC1V 9EL, UK
| | - Elbay Aliyev
- Department of Human Genetics, Sidra Medicine, PO Box 26999, Doha, Qatar
| | | | - Paulina Ocieczek
- Moorfields Eye Hospital NHS Foundation Trust, London, EC1V 2PD, UK
| | | | - Khalid Fakhro
- Department of Human Genetics, Sidra Medicine, PO Box 26999, Doha, Qatar
| | - Mariya Moosajee
- Institute of Ophthalmology, University College London, London, EC1V 9EL, UK.
- Moorfields Eye Hospital NHS Foundation Trust, London, EC1V 2PD, UK.
- The Francis Crick Institute, London, NW1 1AT, UK.
| |
Collapse
|
46
|
Tordai H, Torres O, Csepi M, Padányi R, Lukács GL, Hegedűs T. Analysis of AlphaMissense data in different protein groups and structural context. Sci Data 2024; 11:495. [PMID: 38744964 PMCID: PMC11094042 DOI: 10.1038/s41597-024-03327-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/29/2024] [Indexed: 05/16/2024] Open
Abstract
Single amino acid substitutions can profoundly affect protein folding, dynamics, and function. The ability to discern between benign and pathogenic substitutions is pivotal for therapeutic interventions and research directions. Given the limitations in experimental examination of these variants, AlphaMissense has emerged as a promising predictor of the pathogenicity of missense variants. Since heterogenous performance on different types of proteins can be expected, we assessed the efficacy of AlphaMissense across several protein groups (e.g. soluble, transmembrane, and mitochondrial proteins) and regions (e.g. intramembrane, membrane interacting, and high confidence AlphaFold segments) using ClinVar data for validation. Our comprehensive evaluation showed that AlphaMissense delivers outstanding performance, with MCC scores predominantly between 0.6 and 0.74. We observed low performance on disordered datasets and ClinVar data related to the CFTR ABC protein. However, a superior performance was shown when benchmarked against the high quality CFTR2 database. Our results with CFTR emphasizes AlphaMissense's potential in pinpointing functional hot spots, with its performance likely surpassing benchmarks calculated from ClinVar and ProteinGym datasets.
Collapse
Affiliation(s)
- Hedvig Tordai
- Institute of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary
| | - Odalys Torres
- Institute of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary
| | - Máté Csepi
- Institute of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary
| | - Rita Padányi
- Institute of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary
| | - Gergely L Lukács
- Department of Physiology and Biochemistry, McGill University, Montréal, QC, Canada
| | - Tamás Hegedűs
- Institute of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary.
- HUN-REN-SU Biophysical Virology Research Group, Budapest, Hungary.
| |
Collapse
|
47
|
Zheng W. Predicting hotspots for disease-causing single nucleotide variants using sequences-based coevolution, network analysis, and machine learning. PLoS One 2024; 19:e0302504. [PMID: 38743747 PMCID: PMC11093321 DOI: 10.1371/journal.pone.0302504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/05/2024] [Indexed: 05/16/2024] Open
Abstract
To enable personalized medicine, it is important yet highly challenging to accurately predict disease-causing mutations in target proteins at high throughput. Previous computational methods have been developed using evolutionary information in combination with various biochemical and structural features of protein residues to discriminate neutral vs. deleterious mutations. However, the power of these methods is often limited because they either assume known protein structures or treat residues independently without fully considering their interactions. To address the above limitations, we build upon recent progress in machine learning, network analysis, and protein language models, and develop a sequences-based variant site prediction workflow based on the protein residue contact networks: 1. We employ and integrate various methods of building protein residue networks using state-of-the-art coevolution analysis tools (RaptorX, DeepMetaPSICOV, and SPOT-Contact) powered by deep learning. 2. We use machine learning algorithms (Random Forest, Gradient Boosting, and Extreme Gradient Boosting) to optimally combine 20 network centrality scores to jointly predict key residues as hot spots for disease mutations. 3. Using a dataset of 107 proteins rich in disease mutations, we rigorously evaluate the network scores individually and collectively (via machine learning). This work supports a promising strategy of combining an ensemble of network scores based on different coevolution analysis methods (and optionally predictive scores from other methods) via machine learning to predict hotspot sites of disease mutations, which will inform downstream applications of disease diagnosis and targeted drug design.
Collapse
Affiliation(s)
- Wenjun Zheng
- Department of Physics, State University of New York at Buffalo, Buffalo, NY, United States of America
| |
Collapse
|
48
|
Chao KR, Wang L, Panchal R, Liao C, Abderrazzaq H, Ye R, Schultz P, Compitello J, Grant RH, Kosmicki JA, Weisburd B, Phu W, Wilson MW, Laricchia KM, Goodrich JK, Goldstein D, Goldstein JI, Vittal C, Poterba T, Baxter S, Watts NA, Solomonson M, Tiao G, Rehm HL, Neale BM, Talkowski ME, MacArthur DG, O'Donnell-Luria A, Karczewski KJ, Radivojac P, Daly MJ, Samocha KE. The landscape of regional missense mutational intolerance quantified from 125,748 exomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.588920. [PMID: 38645134 PMCID: PMC11030311 DOI: 10.1101/2024.04.11.588920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.
Collapse
|
49
|
Camara MD, Zhou Y, Dara A, Tékété MM, Nóbrega de Sousa T, Sissoko S, Dembélé L, Ouologuem N, Hamidou Togo A, Alhousseini ML, Fofana B, Sagara I, Djimde AA, Gil PJ, Lauschke VM. Population-specific variations in KCNH2 predispose patients to delayed ventricular repolarization upon dihydroartemisinin-piperaquine therapy. Antimicrob Agents Chemother 2024; 68:e0139023. [PMID: 38546223 PMCID: PMC11064487 DOI: 10.1128/aac.01390-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/05/2024] [Indexed: 05/03/2024] Open
Abstract
Dihydroartemisinin-piperaquine is efficacious for the treatment of uncomplicated malaria and its use is increasing globally. Despite the positive results in fighting malaria, inhibition of the Kv11.1 channel (hERG; encoded by the KCNH2 gene) by piperaquine has raised concerns about cardiac safety. Whether genetic factors could modulate the risk of piperaquine-mediated QT prolongations remained unclear. Here, we first profiled the genetic landscape of KCNH2 variability using data from 141,614 individuals. Overall, we found 1,007 exonic variants distributed over the entire gene body, 555 of which were missense. By optimizing the gene-specific parametrization of 16 partly orthogonal computational algorithms, we developed a KCNH2-specific ensemble classifier that identified a total of 116 putatively deleterious missense variations. To evaluate the clinical relevance of KCNH2 variability, we then sequenced 293 Malian patients with uncomplicated malaria and identified 13 variations within the voltage sensing and pore domains of Kv11.1 that directly interact with channel blockers. Cross-referencing of genetic and electrocardiographic data before and after piperaquine exposure revealed that carriers of two common variants, rs1805121 and rs41314375, experienced significantly higher QT prolongations (ΔQTc of 41.8 ms and 61 ms, respectively, vs 14.4 ms in controls) with more than 50% of carriers having increases in QTc >30 ms. Furthermore, we identified three carriers of rare population-specific variations who experienced clinically relevant delayed ventricular repolarization. Combined, our results map population-scale genetic variability of KCNH2 and identify genetic biomarkers for piperaquine-induced QT prolongation that could help to flag at-risk patients and optimize efficacy and adherence to antimalarial therapy.
Collapse
Affiliation(s)
- Mahamadou D. Camara
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Antoine Dara
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Mamadou M. Tékété
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Taís Nóbrega de Sousa
- Department of Microbiology and Tumour Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Molecular Biology and Malaria Immunology Research Group, Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
| | - Sékou Sissoko
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Laurent Dembélé
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Nouhoun Ouologuem
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Amadou Hamidou Togo
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Mohamed L. Alhousseini
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Bakary Fofana
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Issaka Sagara
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Abdoulaye A. Djimde
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Pedro J. Gil
- Department of Microbiology and Tumour Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Global Health and Tropical Medicine, Institute of Hygiene and Tropical Medicine, Nova University of Lisbon, Lisbon, Portugal
| | - Volker M. Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, Germany
| |
Collapse
|
50
|
Stenton SL, O'Leary MC, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O'Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson MW, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JOB, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O'Donnell-Luria A. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project. Hum Genomics 2024; 18:44. [PMID: 38685113 PMCID: PMC11057178 DOI: 10.1186/s40246-024-00604-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 04/02/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Collapse
Affiliation(s)
- Sarah L Stenton
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Melanie C O'Leary
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabrielle Lemire
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace E VanNoy
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephanie DiTroia
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vijay S Ganesh
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Emily Groopman
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily O'Heir
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brian Mangilog
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ikeoluwa Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lynn S Pais
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jillian Serrano
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina Austin-Tse
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Riccardo Bellazzi
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Maria Giulia Carta
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | | | - Matteo Floris
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | - Thomas Joseph
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Structural and Computational Biology and Molecular Biophysics Program, Baylor College of Medicine, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Yulan Lu
- Center for Molecular Medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China
| | - Paolo Magni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tarun Karthik Kumar Mamidi
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Marta Mulargia
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Giovanna Nicora
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | | | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Maurizio S Podda
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
- Institute of Clinical Physiology (IFC), CNR, Via Moruzzi 1, 56124, Pisa, Italy
- University of Siena, Siena, Italy
- CTGLab, Institute of Informatics and Telematics (IIT), CNR, ViaMoruzzi 1, 56124, Pisa, Italy
| | - Aditya Rao
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | | | - Vangala G Saipradeep
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Peter Schols
- Invitae, San Francisco, CA, USA
- Codon One, Louvain, EU, Belgium
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, TX, USA
| | - Naveen Sivadasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | | | - Rajgopal Srinivasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Uma Sunderam
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Naina Tiwari
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Xiao Wang
- Center for Molecular Medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China
| | - Yaqiong Wang
- Center for Molecular Medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Elizabeth A Worthey
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Rujie Yin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yuning You
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Daniel Zeiberg
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Constantina Bakolitsa
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Stephanie M Fullerton
- Department of Bioethics and Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O'Donnell-Luria
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|