1
|
Dwivedi A, Chauhan L, Kumar P, Nanda A, Jayakrishnan VY. Novel WAC gene variant identified in the first documented case of DeSanto-Shinawi Syndrome in India. Mol Cell Pediatr 2025; 12:7. [PMID: 40347397 PMCID: PMC12065696 DOI: 10.1186/s40348-025-00193-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 03/27/2025] [Indexed: 05/12/2025] Open
Abstract
BACKGROUND DeSanto-Shinawi Syndrome (DESSH) is a rare neurodevelopmental disorder characterized by intellectual disability, behavioral abnormalities, and distinctive dysmorphic features, linked to likely pathogenic/pathogenic variants in the WAC gene. We report the first documented case of DESSH in India, identified in a 3-year-old male presenting with global developmental delay and coarse facies. RESULTS Exome sequencing revealed a novel heterozygous nonsense likely pathogenic variant (c.1661 C>A(p.Ser554*)) in the WAC gene, expanding the genotypic spectrum associated with this condition. We employed computational methodologies to understand the effects of this novel variant on protein structure and function. In-silico prediction score suggested protein truncation due to the c.1661 C>A (p.Ser554*) variation in the WAC gene, expected to result in a loss of normal protein function. CONCLUSION The findings advocate for increased awareness and genetic testing in atypical cases to facilitate accurate diagnosis and management. This case underscores the importance of considering DESSH in the differential diagnosis of similar neurodevelopmental disorders and enhances our understanding of the genetic diversity within the WAC gene.
Collapse
Affiliation(s)
- Aradhana Dwivedi
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India
| | - Lakshita Chauhan
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India
| | - Pramod Kumar
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India.
| | - Aashna Nanda
- Division of Clinical Genetics, Advance Centre of Pediatrics Medicine, Army Hospital Research & Referral, Delhi Cantt, New Delhi, India
| | | |
Collapse
|
2
|
Zhou K, Gheybi K, Soh PXY, Hayes VM. Evaluating variant pathogenicity prediction tools to establish African inclusive guidelines for germline genetic testing. COMMUNICATIONS MEDICINE 2025; 5:157. [PMID: 40328947 PMCID: PMC12056225 DOI: 10.1038/s43856-025-00883-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 04/24/2025] [Indexed: 05/08/2025] Open
Abstract
BACKGROUND Genetic germline testing is restricted for African patients. Lack of ancestrally relevant genomic data perpetuated by African diversity has resulted in European-biased curated clinical variant databases and pathogenic prediction guidelines. While numerous variant pathogenicity prediction tools (VPPTs) exist, their performance has yet to be established within the context of African diversity. METHODS To address this limitation, we assessed 54 VPPTs for predictive performance (sensitivity, specificity, false positive and negative rates) across 145,291 known pathogenic or benign variants derived from 50 Southern African and 50 European men matched for advanced prostate cancer. Prioritising VPPTs for optimal ancestral performance, we screened 5.3 million variants of unknown significance for predicted functional and oncogenic potential. RESULTS We observe a 2.1- and 4.1-fold increase in the number of known and predicted rare pathogenic or benign variants, respectively, against a 1.6-fold decrease in the number of available interrogated variants in our European over African data. Although sensitivity was significantly lower for our African data overall (0.66 vs 0.71, p = 9.86E-06), MetaSVM, CADD, Eigen-raw, BayesDel-noAF, phyloP100way-vertebrate and MVP outperformed irrespective of ancestry. Conversely, MutationTaster, DANN, LRT and GERP-RS were African-specific top performers, while MutationAssessor, PROVEAN, LIST-S2 and REVEL are European-specific. Using these pathogenic prediction workflows, we narrow the ancestral gap for potentially deleterious and oncogenic variant prediction in favour of our African data by 1.15- and 1.1-fold, respectively. CONCLUSION Although VPPT sensitivity favours European data, our findings provide guidelines for VPPT selection to maximise rare pathogenic variant prediction for African disease studies.
Collapse
Affiliation(s)
- Kangping Zhou
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Kazzem Gheybi
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Pamela X Y Soh
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Vanessa M Hayes
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia.
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK.
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa.
| |
Collapse
|
3
|
Lucas MC, Keßler T, Scharf F, Steinke-Lange V, Klink B, Laner A, Holinski-Feder E. A series of reviews in familial cancer: genetic cancer risk in context variants of uncertain significance in MMR genes: which procedures should be followed? Fam Cancer 2025; 24:42. [PMID: 40317406 DOI: 10.1007/s10689-025-00470-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 04/18/2025] [Indexed: 05/07/2025]
Abstract
Interpreting variants of uncertain significance (VUS) in mismatch repair (MMR) genes remains a major challenge in managing Lynch syndrome and other hereditary cancer syndromes. This review outlines recommended VUS classification procedures, encompassing foundational and specialized methodologies tailored for MMR genes by expert organizations, including InSiGHT and ClinGen's Hereditary Colorectal Cancer/Polyposis Variant Curation Expert Panel (VCEP). Key approaches include: (1) functional data, encompassing direct assays measuring MMR proficiency such as in vitro MMR assays, deep mutational scanning, and MMR cell-based assays, as well as techniques like methylation-tolerant assays, proteomic-based approaches, and RNA sequencing, all of which provide critical functional evidence supporting variant pathogenicity; (2) computational data/tools, including in silico meta-predictors and models, which contribute to robust VUS classification when integrated with experimental evidence; and (3) enhanced variant detection to identify the actual causal variant through whole-genome sequencing and long-read sequencing to detect pathogenic variants missed by traditional methods. These strategies improve diagnostic precision, support clinical decision-making for Lynch syndrome, and establish a flexible framework that can be applied to other OMIM-listed genes.
Collapse
Affiliation(s)
- Morghan C Lucas
- MGZ- Medical Genetics Center, Munich, Germany.
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany.
| | | | | | - Verena Steinke-Lange
- MGZ- Medical Genetics Center, Munich, Germany
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| | - Barbara Klink
- MGZ- Medical Genetics Center, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| | | | - Elke Holinski-Feder
- MGZ- Medical Genetics Center, Munich, Germany
- Medizinische Klinik und Poliklinik IV- Campus Innenstadt, Klinikum der Universität München, Munich, Germany
- Genturis European Reference Network (ERN) Genetic Tumor Risk (GENTURIS), Nijmegen, Netherlands
| |
Collapse
|
4
|
Radjasandirane R, Diharce J, Gelly JC, de Brevern AG. Insights for variant clinical interpretation based on a benchmark of 65 variant effect predictors. Genomics 2025; 117:111036. [PMID: 40127826 DOI: 10.1016/j.ygeno.2025.111036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 02/20/2025] [Accepted: 03/20/2025] [Indexed: 03/26/2025]
Abstract
Single amino acid substitutions in protein sequences are generally harmless, but a certain number of these changes can lead to disease. Accurately predicting the effect of genetic variants is crucial for clinicians as it accelerates the diagnosis of patients with missense variants associated with health problems. Many computational tools have been developed to predict the pathogenicity of genetic variants with various approaches. Analysing the performance of these different computational tools is crucial to provide guidance to both future users and especially clinicians. In this study, a large-scale investigation of 65 tools was conducted. Variants from both clinical and functional contexts were used, incorporating data from the ClinVar database and bibliographic sources. The analysis showed that AlphaMissense often performed very well and was in fact one of the best options among the existing tools. In addition, as expected, meta-predictors perform well on average. Tools using evolutionary information showed the best performance for functional variants. These results also highlighted some heterogeneity in the difficulty of predicting some specific variants while others are always well categorized. Strikingly, the majority of variants from the ClinVar database appear to be easy to predict, while variants from other sources of data are more challenging. This raises questions about the use of ClinVar and the dataset used to validate tools accuracy. In addition, these results show that this variant predictability can be divided into three distinct classes: easy, moderate and hard to predict. We analyzed the parameters leading to these differences and showed that the classes are related to structural and functional information.
Collapse
Affiliation(s)
- Ragousandirane Radjasandirane
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Julien Diharce
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université de la Réunion, INSERM, EFS, BIGR U1134, DSIMB Bioinformatics team, F-75015 Paris, France.
| |
Collapse
|
5
|
Zhang J, Kinch L, Katsonis P, Lichtarge O, Jagota M, Song YS, Sun Y, Shen Y, Kuru N, Dereli O, Adebali O, Alladin MA, Pal D, Capriotti E, Turina MP, Savojardo C, Martelli PL, Babbi G, Casadio R, Pucci F, Rooman M, Cia G, Tsishyn M, Strokach A, Hu Z, van Loggerenberg W, Roth FP, Radivojac P, Brenner SE, Cong Q, Grishin NV. Assessing predictions on fitness effects of missense variants in HMBS in CAGI6. Hum Genet 2025; 144:173-189. [PMID: 39110250 PMCID: PMC12085147 DOI: 10.1007/s00439-024-02680-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 05/17/2024] [Indexed: 02/21/2025]
Abstract
This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Lisa Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Milind Jagota
- Computer Science Division, University of California, Berkeley, CA, 94720, USA
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, CA, 94720, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Nurdan Kuru
- Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Turkey
| | - Onur Dereli
- Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Turkey
| | - Ogun Adebali
- Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Turkey
| | - Muttaqi Ahmad Alladin
- Department of Computational and Data Sciences, Indian Institute of Science, Bangaluru, 560012, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangaluru, 560012, India
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Maria Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Gabriel Cia
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Warren van Loggerenberg
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
| | - Frederick P Roth
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
6
|
Das S, Patel V, Chakravarty S, Ghosh A, Mukhopadhyay A, Biswas NK. An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer. BioData Min 2025; 18:7. [PMID: 39833905 PMCID: PMC11744934 DOI: 10.1186/s13040-024-00420-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 12/26/2024] [Indexed: 01/22/2025] Open
Abstract
BACKGROUND AND OBJECTIVE Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations. This study aims to develop an ensemble machine learning approach to evaluate the performance (rank) of pathogenic and conservation scoring algorithms (PCSAs) based on their ability to distinguish pathogenic driver mutations from benign passenger (non-driver) mutations in head and neck squamous cell carcinoma (HNSC). METHODS The study used a dataset from 502 HNSC patients, classifying mutations based on 299 known high-confidence cancer driver genes. Missense somatic mutations in driver genes were treated as driver mutations, while non-driver mutations were randomly selected from other genes. Each mutation was annotated with 41 PCSAs. Three machine learning algorithms-logistic regression, random forest, and support vector machine-along with recursive feature elimination, were used to rank these PCSAs. The final ranking of the PCSAs was determined using rank-average-sort and rank-sum-sort methods. RESULTS The random forest algorithm emerged as the top performer among the three tested ML algorithms, with an AUC-ROC of 0.89, compared to 0.83 for the other two, in distinguishing pathogenic driver mutations from benign passenger mutations using all 41 PCSAs. The top 11 PCSAs were selected based on the first quintile cut-off from the final rank-sum distribution. Classifiers built using these top 11 PCSAs (DEOGEN2, Integrated_fitCons, MVP, etc.) demonstrated significantly higher performance (p-value < 2.22e-16) compared to those using the remaining 30 PCSAs across all three ML algorithms, in separating pathogenic driver from benign passenger mutations. The top PCSAs demonstrated strong performance on a validation cohort including independent HNSC and other cancer types: breast, lung, and colorectal - reflecting its consistency, robustness and generalizability. CONCLUSIONS The ensemble machine learning approach effectively evaluates the performance of PCSAs based on their ability to differentiate pathogenic drivers from benign passenger mutations in HNSC and other cancer types. Notably, some well-known PCSAs performed poorly, underscoring the importance of data-driven selection over relying solely on popularity.
Collapse
Affiliation(s)
- Subrata Das
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Vatsal Patel
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Shouvik Chakravarty
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Arnab Ghosh
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, 741235, India.
| | - Nidhan K Biswas
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India.
| |
Collapse
|
7
|
Zeng G, Zhao C, Li G, Huang Z, Zhuang J, Liang X, Yu X, Fang S. Identifying somatic driver mutations in cancer with a language model of the human genome. Comput Struct Biotechnol J 2025; 27:531-540. [PMID: 39968174 PMCID: PMC11833646 DOI: 10.1016/j.csbj.2025.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/12/2025] [Accepted: 01/14/2025] [Indexed: 02/20/2025] Open
Abstract
Somatic driver mutations play important roles in cancer and must be precisely identified to advance our understanding of tumorigenesis and its promotion and progression. However, identifying somatic driver mutations remains challenging in Homo sapiens genomics due to the random nature of mutations and the high cost of qualitative experiments. Building on the powerful sequence interpretation capabilities of language models, we propose a self-attention-based contextualized pretrained language model for somatic driver mutation identification. We pretrained the model with the Homo sapiens reference genome to equip it with the ability to understand genome sequences and then fine-tuned it for oncogene and tumor suppressor gene prediction tasks, enabling it to extract features related to driver genes from the original genome sequence. The fine-tuned model was used to obtain the mutations' carcinogenic effect characteristics to further identify whether the mutation is a driver or a passenger. Compared with other computational algorithms, our method achieved excellent somatic driver mutation identification performance on the test set, with an absolute improvement of 4.31% in AUROC over the best comparison method. The strong performance of our method indicates that it can provide new insights into the discovery of cancer drivers.
Collapse
Affiliation(s)
- Guangjian Zeng
- School of Biomedical Engineering, Shenzhen University, Shenzhen, China
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, China
| | - Chengzhi Zhao
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, China
| | - Guanpeng Li
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, China
| | - Zhengyang Huang
- School of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Jinhu Zhuang
- Shenzhen Health Development Research and Data Management Center, Guangdong, China
| | - Xiaohua Liang
- Department of Clinical Epidemiology and Biostatistics, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, China
| | - Xiaxia Yu
- School of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Shenying Fang
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
8
|
Zhao W, Tao Y, Xiong J, Liu L, Wang Z, Shao C, Shang L, Hu Y, Xu Y, Su Y, Yu J, Feng T, Xie J, Xu H, Zhang Z, Peng J, Wu J, Zhang Y, Zhu S, Xia K, Tang B, Zhao G, Li J, Li B. GoFCards: an integrated database and analytic platform for gain of function variants in humans. Nucleic Acids Res 2025; 53:D976-D988. [PMID: 39578693 PMCID: PMC11701611 DOI: 10.1093/nar/gkae1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/20/2024] [Accepted: 10/28/2024] [Indexed: 11/24/2024] Open
Abstract
Gain-of-function (GOF) variants, which introduce new or amplify protein functions, are essential for understanding disease mechanisms. Despite advances in genomics and functional research, identifying and analyzing pathogenic GOF variants remains challenging owing to fragmented data and database limitations, underscoring the difficulty in accessing critical genetic information. To address this challenge, we manually reviewed the literature, pinpointing 3089 single-nucleotide variants and 72 insertions and deletions in 579 genes associated with 1299 diseases from 2069 studies, and integrated these with the 3.5 million predicted GOF variants. Our approach is complemented by a proprietary scoring system that prioritizes GOF variants on the basis of the evidence supporting their GOF effects and provides predictive scores for variants that lack existing documentation. We then developed a database named GoFCards for general geneticists and clinicians to easily obtain GOF variants in humans (http://www.genemed.tech/gofcards). This database also contains data from >150 sources and offers comprehensive variant-level and gene-level annotations, with the aim of providing users with convenient access to detailed and relevant genetic information. Furthermore, GoFCards empowers users with limited bioinformatic skills to analyze and annotate genetic data, and prioritize GOF variants. GoFCards offers an efficient platform for interpreting GOF variants and thereby advancing genetic research.
Collapse
Affiliation(s)
- Wenjing Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Medical Genetics, NHC Key Laboratory of Healthy Birth and Birth Defect Prevention in Western China, The First People's Hospital of Yunnan Province, No. 157 Jinbi Road, Xishan District, Kunming, Yunnan 650000, China
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Youfu Tao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Xiong
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Lei Liu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zhongqing Wang
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Chuhan Shao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Ling Shang
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yue Hu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yishu Xu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yingluo Su
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiahui Yu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Tianyi Feng
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Junyi Xie
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Huijuan Xu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zijun Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Peng
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jianbin Wu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yuchang Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Shaobo Zhu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Kun Xia
- MOE Key Laboratory of Pediatric Rare Diseases & Hunan Key Laboratory of Medical Genetics, Central South University, No. 110 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Neurology & Multi-omics Research Center for Brain Disorders, The First Affiliated Hospital University of South China, 69 Chuan Shan Road, Shi Gu District, Hengyang, Hunan 421000, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| |
Collapse
|
9
|
Katsonis P, Lichtarge O. Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects. Nat Commun 2025; 16:159. [PMID: 39746940 PMCID: PMC11696468 DOI: 10.1038/s41467-024-55066-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 11/27/2024] [Indexed: 01/04/2025] Open
Abstract
Computational methods for estimating missense variant impact suffer from inconsistent performance across genes, which poses a major challenge for their reliable use in clinical practice. While ensemble scores leverage multiple prediction methods to enhance consistency, the overrepresentation of certain genes in the training data can bias their outcomes. To address this critical limitation, we propose a gene-specific ensemble framework trained on reference computational annotations rather than on clinical or experimental data. Accordingly, we generate Meta-EA ensemble scores that achieve comparable performance to the top individual predicting method for each gene set. Incorporating the effects of splicing and the allele frequency of human polymorphisms further enhances the performance of Meta-EA, achieving an area under the receiver operating characteristic curve of 0.97 for both gene-balanced and imbalanced clinical assessments. In conclusion, this work leverages the wealth of existing variant impact prediction approaches to generate improved estimations for clinical interpretation.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
10
|
Xia R, Yin X, Huang J, Chen K, Ma J, Wei Z, Su J, Blake N, Rigden DJ, Meng J, Song B. Interpretable deep cross networks unveiled common signatures of dysregulated epitranscriptomes across 12 cancer types. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102376. [PMID: 39618823 PMCID: PMC11605186 DOI: 10.1016/j.omtn.2024.102376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 10/25/2024] [Indexed: 01/12/2025]
Abstract
Cancer is a complex and multifaceted group of diseases characterized by uncontrolled cell growth that leads to the formation of malignant tumors. Recent studies suggest that N6-methyladenosine (m6A) RNA methylation plays pivotal roles in cancer pathology by influencing various cellular processes. However, the degree to which these mechanisms are shared across different cancer types remains unclear. In this study, we analyze an expansive array of 167 m6A epitranscriptome profiles covering 12 distinct cancer types and their originating normal tissues. We trained 12 distinct, cancer type-specific interpretable deep cross network models, which successfully distinguish between specific pairs of normal and cancer m6A contexts using integrated information from both the sequences and curated genomic knowledge. Interestingly, cross-cancer type testing indicated the existence of shared genomic patterns across various cancers at the epitranscriptome level. A pan-cancer model was subsequently developed to identify these shared patterns that could not be observed in a single cancer type. Our analysis uncovered, for the first time, a common epitranscriptome signature shared across multiple cancer types, particularly associated with RNA hybridization process and aberrant splicing. This highlights the importance of a comprehensive understanding of the pan-cancer epitranscriptome and holding potential implications in the development of RNA methylation-based therapeutics for various cancers.
Collapse
Affiliation(s)
- Rong Xia
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Xiangyu Yin
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jiaming Huang
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
| | - Jiongming Ma
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Infection, Veterinary & Ecological Sciences, University of Liverpool, L7 8TX Liverpool, UK
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Neil Blake
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jia Meng
- Institute of Biomedical Research, Regulatory Mechanism and Targeted Therapy for Liver Cancer Shiyan Key Laboratory, Hubei Provincial Clinical Research Center for Precise Diagnosis and Treatment of Liver Cancer, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei 442000, China
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| |
Collapse
|
11
|
Wei Y, Zhang T, Wang B, Jiang X, Ling F, Fang M, Jin X, Bai Y. INDELpred: Improving the prediction and interpretation of indel pathogenicity within the clinical genome. HGG ADVANCES 2024; 5:100325. [PMID: 38993112 PMCID: PMC11321314 DOI: 10.1016/j.xhgg.2024.100325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 07/04/2024] [Accepted: 07/04/2024] [Indexed: 07/13/2024] Open
Abstract
Small insertions and deletions (indels) are critical yet challenging genetic variations with significant clinical implications. However, the identification of pathogenic indels from neutral variants in clinical contexts remains an understudied problem. Here, we developed INDELpred, a machine-learning-based predictive model for discerning pathogenic from benign indels. INDELpred was established based on key features, including allele frequency, indel length, function-based features, and gene-based features. A set of comprehensive evaluation analyses demonstrated that INDELpred exhibited superior performance over competing methods in terms of computational efficiency and prediction accuracy. Importantly, INDELpred highlighted the crucial role of function-based features in identifying pathogenic indels, with a clear interpretability of the features in understanding the disease-causing variants. We envisage INDELpred as a desirable tool for the detection of pathogenic indels within large-scale genomic datasets, thereby enhancing the precision of genetic diagnoses in clinical settings.
Collapse
Affiliation(s)
- Yilin Wei
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China; BGI Research, Shenzhen 518083, China
| | | | | | | | - Fei Ling
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | | | - Xin Jin
- BGI Research, Shenzhen 518083, China; The Innovation Centre of Ministry of Education for Development and Diseases, School of Medicine, South China University of Technology, Guangzhou 510006, China; Shanxi Medical University-BGI Collaborative Center for Future Medicine, Shanxi Medical University, Taiyuan 030001, China; Shenzhen Key Laboratory of Transomics Biotechnologies, BGI Research, Shenzhen, China.
| | - Yong Bai
- BGI Research, Shenzhen 518083, China.
| |
Collapse
|
12
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
13
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. Nat Genet 2024; 56:1632-1643. [PMID: 38977852 DOI: 10.1038/s41588-024-01820-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/29/2024] [Indexed: 07/10/2024]
Abstract
Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | | | - Hakhamanesh Mostafavi
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Population Health, New York University, New York, NY, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
14
|
Jin W, Xia Y, Thela SR, Liu Y, Chen L. In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600715. [PMID: 38979263 PMCID: PMC11230389 DOI: 10.1101/2024.06.25.600715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an in vitro high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to in silico generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.
Collapse
Affiliation(s)
- Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Sai Ritesh Thela
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| |
Collapse
|
15
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
16
|
Ban HJ, Lee S, Jin HJ. Exploring Stroke Risk through Mendelian Randomization: A Comprehensive Study Integrating Genetics and Metabolic Traits in the Korean Population. Biomedicines 2024; 12:1311. [PMID: 38927518 PMCID: PMC11201557 DOI: 10.3390/biomedicines12061311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 06/11/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
Numerous risk factors play a role in the causation of stroke, and the cardiometabolic condition is a one of the most important. In Korea, various treatment methods are employed based on the constitutional type, which is known to differ significantly in cardiometabolic disease. In this study, we compared the estimates obtained for different groups by applying the Mendelian randomization method to investigate the causal effects of genetic characteristics on stroke, according to constitutional type. In clinical analysis, the subtypes differ significantly in diabetes or dyslipidemia. The genetic association estimates for the stroke subtype risk were obtained from MEGASTROKE, the International Stroke Genetics Consortium (ISGC), UKbiobank, and BioBank Japan (BBJ), using group-related SNPs as instrumental variables. The TE subtypes with higher risk of metabolic disease were associated with increased risk (beta = 4.190; s.e. = 1.807; p = 0.035) of cardioembolic stroke (CES), and the SE subtypes were associated with decreased risk (beta = -9.336, s.e. = 1.753; p = 3.87 × 10-5) of CES. The findings highlight the importance of personalized medicine in assessing disease risk based on an individual's constitutional type.
Collapse
Affiliation(s)
| | | | - Hee-Jeong Jin
- Korean Medicine (KM) Data Division, Korea Institute of Oriental Medicine, Daejeon 34054, Republic of Korea; (H.-J.B.); (S.L.)
| |
Collapse
|
17
|
Silva DB, Trinidad M, Ljungdahl A, Revalde JL, Berguig GY, Wallace W, Patrick CS, Bomba L, Arkin M, Dong S, Estrada K, Hutchinson K, LeBowitz JH, Schlessinger A, Johannesen KM, Møller RS, Giacomini KM, Froelich S, Sanders SJ, Wuster A. Haploinsufficiency underlies the neurodevelopmental consequences of SLC6A1 variants. Am J Hum Genet 2024; 111:1222-1238. [PMID: 38781976 PMCID: PMC11179425 DOI: 10.1016/j.ajhg.2024.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 04/26/2024] [Accepted: 04/26/2024] [Indexed: 05/25/2024] Open
Abstract
Heterozygous variants in SLC6A1, encoding the GAT-1 GABA transporter, are associated with seizures, developmental delay, and autism. The majority of affected individuals carry missense variants, many of which are recurrent germline de novo mutations, raising the possibility of gain-of-function or dominant-negative effects. To understand the functional consequences, we performed an in vitro GABA uptake assay for 213 unique variants, including 24 control variants. De novo variants consistently resulted in a decrease in GABA uptake, in keeping with haploinsufficiency underlying all neurodevelopmental phenotypes. Where present, ClinVar pathogenicity reports correlated well with GABA uptake data; the functional data can inform future reports for the remaining 72% of unscored variants. Surface localization was assessed for 86 variants; two-thirds of loss-of-function missense variants prevented GAT-1 from being present on the membrane while GAT-1 was on the surface but with reduced activity for the remaining third. Surprisingly, recurrent de novo missense variants showed moderate loss-of-function effects that reduced GABA uptake with no evidence for dominant-negative or gain-of-function effects. Using linear regression across multiple missense severity scores to extrapolate the functional data to all potential SLC6A1 missense variants, we observe an abundance of GAT-1 residues that are sensitive to substitution. The extent of this missense vulnerability accounts for the clinically observed missense enrichment; overlap with hypermutable CpG sites accounts for the recurrent missense variants. Strategies to increase the expression of the wild-type SLC6A1 allele are likely to be beneficial across neurodevelopmental disorders, though the developmental stage and extent of required rescue remain unknown.
Collapse
Affiliation(s)
- Dina Buitrago Silva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Marena Trinidad
- BioMarin Pharmaceutical Inc., Novato, CA, USA; Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Alicia Ljungdahl
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA; Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford OX3 7TY, UK
| | - Jezrael L Revalde
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
| | | | | | - Cory S Patrick
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | | | - Michelle Arkin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
| | - Shan Dong
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | | | - Keino Hutchinson
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Katrine M Johannesen
- Department of Regional Health Research, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - Rikke S Møller
- Department of Regional Health Research, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark; Department of Epilepsy Genetics and Personalized Medicine, Member of ERN Epicare, Danish Epilepsy Centre, Dianalund, Denmark
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | | | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA; Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford OX3 7TY, UK.
| | | |
Collapse
|
18
|
Zhou Y, Pirmann S, Lauschke VM. APF2: an improved ensemble method for pharmacogenomic variant effect prediction. THE PHARMACOGENOMICS JOURNAL 2024; 24:17. [PMID: 38802404 PMCID: PMC11129946 DOI: 10.1038/s41397-024-00338-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/26/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024]
Abstract
Lack of efficacy or adverse drug response are common phenomena in pharmacological therapy causing considerable morbidity and mortality. It is estimated that 20-30% of this variability in drug response stems from variations in genes encoding drug targets or factors involved in drug disposition. Leveraging such pharmacogenomic information for the preemptive identification of patients who would benefit from dose adjustments or alternative medications thus constitutes an important frontier of precision medicine. Computational methods can be used to predict the functional effects of variant of unknown significance. However, their performance on pharmacogenomic variant data has been lackluster. To overcome this limitation, we previously developed an ensemble classifier, termed APF, specifically designed for pharmacogenomic variant prediction. Here, we aimed to further improve predictions by leveraging recent key advances in the prediction of protein folding based on deep neural networks. Benchmarking of 28 variant effect predictors on 530 pharmacogenetic missense variants revealed that structural predictions using AlphaMissense were most specific, whereas APF exhibited the most balanced performance. We then developed a new tool, APF2, by optimizing algorithm parametrization of the top performing algorithms for pharmacogenomic variations and aggregating their predictions into a unified ensemble score. Importantly, APF2 provides quantitative variant effect estimates that correlate well with experimental results (R2 = 0.91, p = 0.003) and predicts the functional impact of pharmacogenomic variants with higher accuracy than previous methods, particularly for clinically relevant variations with actionable pharmacogenomic guidelines. We furthermore demonstrate better performance (92% accuracy) on an independent test set of 146 variants across 61 pharmacogenes not used for model training or validation. Application of APF2 to population-scale sequencing data from over 800,000 individuals revealed drastic ethnogeographic differences with important implications for pharmacotherapy. We thus think that APF2 holds the potential to improve the translation of genetic information into pharmacogenetic recommendations, thereby facilitating the use of Next-Generation Sequencing data for stratified medicine.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden
| | - Sebastian Pirmann
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden.
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany.
- University of Tübingen, Tübingen, Germany.
| |
Collapse
|
19
|
Camara MD, Zhou Y, Dara A, Tékété MM, Nóbrega de Sousa T, Sissoko S, Dembélé L, Ouologuem N, Hamidou Togo A, Alhousseini ML, Fofana B, Sagara I, Djimde AA, Gil PJ, Lauschke VM. Population-specific variations in KCNH2 predispose patients to delayed ventricular repolarization upon dihydroartemisinin-piperaquine therapy. Antimicrob Agents Chemother 2024; 68:e0139023. [PMID: 38546223 PMCID: PMC11064487 DOI: 10.1128/aac.01390-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/05/2024] [Indexed: 05/03/2024] Open
Abstract
Dihydroartemisinin-piperaquine is efficacious for the treatment of uncomplicated malaria and its use is increasing globally. Despite the positive results in fighting malaria, inhibition of the Kv11.1 channel (hERG; encoded by the KCNH2 gene) by piperaquine has raised concerns about cardiac safety. Whether genetic factors could modulate the risk of piperaquine-mediated QT prolongations remained unclear. Here, we first profiled the genetic landscape of KCNH2 variability using data from 141,614 individuals. Overall, we found 1,007 exonic variants distributed over the entire gene body, 555 of which were missense. By optimizing the gene-specific parametrization of 16 partly orthogonal computational algorithms, we developed a KCNH2-specific ensemble classifier that identified a total of 116 putatively deleterious missense variations. To evaluate the clinical relevance of KCNH2 variability, we then sequenced 293 Malian patients with uncomplicated malaria and identified 13 variations within the voltage sensing and pore domains of Kv11.1 that directly interact with channel blockers. Cross-referencing of genetic and electrocardiographic data before and after piperaquine exposure revealed that carriers of two common variants, rs1805121 and rs41314375, experienced significantly higher QT prolongations (ΔQTc of 41.8 ms and 61 ms, respectively, vs 14.4 ms in controls) with more than 50% of carriers having increases in QTc >30 ms. Furthermore, we identified three carriers of rare population-specific variations who experienced clinically relevant delayed ventricular repolarization. Combined, our results map population-scale genetic variability of KCNH2 and identify genetic biomarkers for piperaquine-induced QT prolongation that could help to flag at-risk patients and optimize efficacy and adherence to antimalarial therapy.
Collapse
Affiliation(s)
- Mahamadou D. Camara
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Antoine Dara
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Mamadou M. Tékété
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Taís Nóbrega de Sousa
- Department of Microbiology and Tumour Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Molecular Biology and Malaria Immunology Research Group, Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
| | - Sékou Sissoko
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Laurent Dembélé
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Nouhoun Ouologuem
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Amadou Hamidou Togo
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Mohamed L. Alhousseini
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Bakary Fofana
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Issaka Sagara
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Abdoulaye A. Djimde
- Department of Epidemiology of Parasitic Diseases, Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies, Bamako, Mali
| | - Pedro J. Gil
- Department of Microbiology and Tumour Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Global Health and Tropical Medicine, Institute of Hygiene and Tropical Medicine, Nova University of Lisbon, Lisbon, Portugal
| | - Volker M. Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, Germany
| |
Collapse
|
20
|
Ye X, Guerin LN, Chen Z, Rajendren S, Dunker W, Zhao Y, Zhang R, Hodges E, Karijolich J. Enhancer-promoter activation by the Kaposi sarcoma-associated herpesvirus episome maintenance protein LANA. Cell Rep 2024; 43:113888. [PMID: 38416644 PMCID: PMC11005752 DOI: 10.1016/j.celrep.2024.113888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 12/29/2023] [Accepted: 02/14/2024] [Indexed: 03/01/2024] Open
Abstract
Higher-order genome structure influences the transcriptional regulation of cellular genes through the juxtaposition of regulatory elements, such as enhancers, close to promoters of target genes. While enhancer activation has emerged as an important facet of Kaposi sarcoma-associated herpesvirus (KSHV) biology, the mechanisms controlling enhancer-target gene expression remain obscure. Here, we discover that the KSHV genome tethering protein latency-associated nuclear antigen (LANA) potentiates enhancer-target gene expression in primary effusion lymphoma (PEL), a highly aggressive B cell lymphoma causally associated with KSHV. Genome-wide analyses demonstrate increased levels of enhancer RNA transcription as well as activating chromatin marks at LANA-bound enhancers. 3D genome conformation analyses identified genes critical for latency and tumorigenesis as targets of LANA-occupied enhancers, and LANA depletion results in their downregulation. These findings reveal a mechanism in enhancer-gene coordination and describe a role through which the main KSHV tethering protein regulates essential gene expression in PEL.
Collapse
Affiliation(s)
- Xiang Ye
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Lindsey N Guerin
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Ziche Chen
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Suba Rajendren
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - William Dunker
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Yang Zhao
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Ruilin Zhang
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Emily Hodges
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA; Vanderbilt-Ingram Cancer Center, Nashville, TN 37232, USA; Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - John Karijolich
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA; Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA; Vanderbilt-Ingram Cancer Center, Nashville, TN 37232, USA; Vanderbilt Institute for Infection, Immunology, and Inflammation, Nashville, TN 37232, USA; Vanderbilt Center for Immunobiology, Nashville, TN 37232, USA.
| |
Collapse
|
21
|
Lim D, Baek C, Blanchette M. Graphylo: A deep learning approach for predicting regulatory DNA and RNA sites from whole-genome multiple alignments. iScience 2024; 27:109002. [PMID: 38362268 PMCID: PMC10867641 DOI: 10.1016/j.isci.2024.109002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/17/2023] [Accepted: 01/19/2024] [Indexed: 02/17/2024] Open
Abstract
This study focuses on enhancing the prediction of regulatory functional sites in DNA and RNA sequences, a crucial aspect of gene regulation. Current methods, such as motif overrepresentation and machine learning, often lack specificity. To address this issue, the study leverages evolutionary information and introduces Graphylo, a deep-learning approach for predicting transcription factor binding sites in the human genome. Graphylo combines Convolutional Neural Networks for DNA sequences with Graph Convolutional Networks on phylogenetic trees, using information from placental mammals' genomes and evolutionary history. The research demonstrates that Graphylo consistently outperforms both single-species deep learning techniques and methods that incorporate inter-species conservation scores on a wide range of datasets. It achieves this by utilizing a species-based attention model for evolutionary insights and an integrated gradient approach for nucleotide-level model interpretability. This innovative approach offers a promising avenue for improving the accuracy of regulatory site prediction in genomics.
Collapse
|
22
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
23
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
24
|
Zhu X, Ma S, Wong WH. Genetic effects of sequence-conserved enhancer-like elements on human complex traits. Genome Biol 2024; 25:1. [PMID: 38167462 PMCID: PMC10759394 DOI: 10.1186/s13059-023-03142-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits. RESULTS Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes. CONCLUSIONS Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, 16802, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, 201 Huck Life Sciences Building, University Park, 16802, PA, USA.
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
| | - Shining Ma
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA.
| |
Collapse
|
25
|
Kim HW, Baek M, Jung S, Jang S, Lee H, Yang SH, Kwak BS, Kim SJ. ELOVL2-AS1 suppresses tamoxifen resistance by sponging miR-1233-3p in breast cancer. Epigenetics 2023; 18:2276384. [PMID: 37908128 PMCID: PMC10621244 DOI: 10.1080/15592294.2023.2276384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023] Open
Abstract
Tamoxifen (Tam) has long been a top treatment option for breast cancer patients, but the challenge of eliminating cancer recurrence remains. Here, we identify a signalling pathway involving ELOVL2, ELOVL2-AS1, and miR-1233-3p, which contributes to drug resistance in Tam-resistant (TamR) breast cancer. ELOVL2-AS1, a long noncoding RNA, was significantly upregulated by its antisense gene, ELOVL2, which is known to be downregulated in TamR cells. Additionally, ELOVL2-AS1 underwent the most hypermethylation in MCF-7/TamR cells. Furthermore, patients with breast cancer who developed TamR during chemotherapy had significantly lower expression of ELOVL2-AS1 compared to those who responded to Tam. Ectopic downregulation of ELOVL2-AS1 by siRNA both stimulated cancer cell growth and deteriorated TamR. We also found that ELOVL2-AS1 sponges miR-1233-3p, which has pro-proliferative activity and elevates TamR, leading to the activation of potential target genes, such as MYEF2, NDST1, and PIK3R1. These findings suggest that ELOVL2-AS1, in association with ELOVL2, may contribute to the suppression of drug resistance by sponging miR-1233-3p in breast cancer.
Collapse
Affiliation(s)
- Hyeon Woo Kim
- Department of Life Science, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Minjae Baek
- Department of Life Science, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Sanghyun Jung
- Department of Life Science, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Siyeon Jang
- Department of Life Science, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Hyeonjin Lee
- Department of Life Science, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Seung-Hoon Yang
- Department of Biomedical Engineering, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Beom Seok Kwak
- Department of Surgery, Ilsan Hospital, College of Medicine, Dongguk University, Goyang, Republic of Korea
| | - Sun Jung Kim
- Department of Life Science, Dongguk University-Seoul, Goyang, Republic of Korea
| |
Collapse
|
26
|
Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Yu DJ, Shoombuatong W. MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction. J Chem Inf Model 2023; 63:7239-7257. [PMID: 37947586 PMCID: PMC10685454 DOI: 10.1021/acs.jcim.3c00950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/21/2023] [Accepted: 10/23/2023] [Indexed: 11/12/2023]
Abstract
Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals' outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites' ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals' outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.
Collapse
Affiliation(s)
- Fang Ge
- School
of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, 9 Wenyuanlu, Nanjing 210023, China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Zihao Yan
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Hanin Alahmadi
- College of
Computer Science and Engineering, Taibah
University, Madinah 344, Saudi Arabia
| | - Apilak Worachartcheewan
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Dong-Jun Yu
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Watshara Shoombuatong
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
27
|
Joynt AT, Kavanagh EW, Newby GA, Mitchell S, Eastman AC, Paul KC, Bowling AD, Osorio DL, Merlo CA, Patel SU, Raraigh KS, Liu DR, Sharma N, Cutting GR. Protospacer modification improves base editing of a canonical splice site variant and recovery of CFTR function in human airway epithelial cells. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 33:335-350. [PMID: 37547293 PMCID: PMC10400809 DOI: 10.1016/j.omtn.2023.06.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 06/26/2023] [Indexed: 08/08/2023]
Abstract
Canonical splice site variants affecting the 5' GT and 3' AG nucleotides of introns result in severe missplicing and account for about 10% of disease-causing genomic alterations. Treatment of such variants has proven challenging due to the unstable mRNA or protein isoforms that typically result from disruption of these sites. Here, we investigate CRISPR-Cas9-mediated adenine base editing for such variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene. We validate a CFTR expression minigene (EMG) system for testing base editing designs for two different targets. We then use the EMG system to test non-standard single-guide RNAs with either shortened or lengthened protospacers to correct the most common cystic fibrosis-causing variant in individuals of African descent (c.2988+1G>A). Varying the spacer region length allowed placement of the editing window in a more efficient context and enabled use of alternate protospacer adjacent motifs. Using these modifications, we restored clinically significant levels of CFTR function to human airway epithelial cells from two donors bearing the c.2988+1G>A variant.
Collapse
Affiliation(s)
- Anya T. Joynt
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Erin W. Kavanagh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Gregory A. Newby
- Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Shakela Mitchell
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Alice C. Eastman
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Kathleen C. Paul
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Alyssa D. Bowling
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Derek L. Osorio
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Christian A. Merlo
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Johns Hopkins Hospital, Baltimore, MD 21287, USA
| | - Shivani U. Patel
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Johns Hopkins Hospital, Baltimore, MD 21287, USA
| | - Karen S. Raraigh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - David R. Liu
- Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Neeraj Sharma
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| | - Garry R. Cutting
- Department of Genetic Medicine, Johns Hopkins University School of Medicine Baltimore, MD 21205, USA
| |
Collapse
|
28
|
Jiang TT, Fang L, Wang K. Deciphering "the language of nature": A transformer-based language model for deleterious mutations in proteins. Innovation (N Y) 2023; 4:100487. [PMID: 37636282 PMCID: PMC10448337 DOI: 10.1016/j.xinn.2023.100487] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 07/25/2023] [Indexed: 08/29/2023] Open
Abstract
Various machine-learning models, including deep neural network models, have already been developed to predict deleteriousness of missense (non-synonymous) mutations. Potential improvements to the current state of the art, however, may still benefit from a fresh look at the biological problem using more sophisticated self-adaptive machine-learning approaches. Recent advances in the field of natural language processing show that transformer models-a type of deep neural network-to be particularly powerful at modeling sequence information with context dependence. In this study, we introduce MutFormer, a transformer-based model for the prediction of deleterious missense mutations, which uses reference and mutated protein sequences from the human genome as the primary features. MutFormer takes advantage of a combination of self-attention layers and convolutional layers to learn both long-range and short-range dependencies between amino acid mutations in a protein sequence. We first pre-trained MutFormer on reference protein sequences and mutated protein sequences resulting from common genetic variants observed in human populations. We next examined different fine-tuning methods to successfully apply the model to deleteriousness prediction of missense mutations. Finally, we evaluated MutFormer's performance on multiple testing datasets. We found that MutFormer showed similar or improved performance over a variety of existing tools, including those that used conventional machine-learning approaches. In conclusion, MutFormer considers sequence features that are not explored in previous studies and can complement existing computational predictions or empirically generated functional scores to improve our understanding of disease variants.
Collapse
Affiliation(s)
- Theodore T. Jiang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Palisades Charter High School, Pacific Palisades, CA 90272, USA
- Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
29
|
Roca-Umbert A, Garcia-Calleja J, Vogel-González M, Fierro-Villegas A, Ill-Raga G, Herrera-Fernández V, Bosnjak A, Muntané G, Gutiérrez E, Campelo F, Vicente R, Bosch E. Human genetic adaptation related to cellular zinc homeostasis. PLoS Genet 2023; 19:e1010950. [PMID: 37747921 PMCID: PMC10553801 DOI: 10.1371/journal.pgen.1010950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 10/05/2023] [Accepted: 08/31/2023] [Indexed: 09/27/2023] Open
Abstract
SLC30A9 encodes a ubiquitously zinc transporter (ZnT9) and has been consistently suggested as a candidate for positive selection in humans. However, no direct adaptive molecular phenotype has been demonstrated. Our results provide evidence for directional selection operating in two major complementary haplotypes in Africa and East Asia. These haplotypes are associated with differential gene expression but also differ in the Met50Val substitution (rs1047626) in ZnT9, which we show is found in homozygosis in the Denisovan genome and displays accompanying signatures suggestive of archaic introgression. Although we found no significant differences in systemic zinc content between individuals with different rs1047626 genotypes, we demonstrate that the expression of the derived isoform (ZnT9 50Val) in HEK293 cells shows a gain of function when compared with the ancestral (ZnT9 50Met) variant. Notably, the ZnT9 50Val variant was found associated with differences in zinc handling by the mitochondria and endoplasmic reticulum, with an impact on mitochondrial metabolism. Given the essential role of the mitochondria in skeletal muscle and since the derived allele at rs1047626 is known to be associated with greater susceptibility to several neuropsychiatric traits, we propose that adaptation to cold may have driven this selection event, while also impacting predisposition to neuropsychiatric disorders in modern humans.
Collapse
Affiliation(s)
- Ana Roca-Umbert
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
| | - Jorge Garcia-Calleja
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
| | - Marina Vogel-González
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Alejandro Fierro-Villegas
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Gerard Ill-Raga
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Víctor Herrera-Fernández
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Anja Bosnjak
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Gerard Muntané
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
- Hospital Universitari Institut Pere Mata, IISPV, Universitat Rovira i Virgili, Reus, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
| | - Esteban Gutiérrez
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Felix Campelo
- ICFO-Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Rubén Vicente
- Laboratory of Molecular Physiology, Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Barcelona, Spain
| | - Elena Bosch
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
30
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
31
|
Ren Z, Li Q, Cao K, Li MM, Zhou Y, Wang K. Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data. BMC Bioinformatics 2023; 24:43. [PMID: 36759776 PMCID: PMC9909865 DOI: 10.1186/s12859-023-05141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 02/11/2023] Open
Abstract
BACKGROUND It remains an important challenge to predict the functional consequences or clinical impacts of genetic variants in human diseases, such as cancer. An increasing number of genetic variants in cancer have been discovered and documented in public databases such as COSMIC, but the vast majority of them have no functional or clinical annotations. Some databases, such as CiVIC are available with manual annotation of functional mutations, but the size of the database is small due to the use of human annotation. Since the unlabeled data (millions of variants) typically outnumber labeled data (thousands of variants), computational tools that take advantage of unlabeled data may improve prediction accuracy. RESULT To leverage unlabeled data to predict functional importance of genetic variants, we introduced a method using semi-supervised generative adversarial networks (SGAN), incorporating features from both labeled and unlabeled data. Our SGAN model incorporated features from clinical guidelines and predictive scores from other computational tools. We also performed comparative analysis to study factors that influence prediction accuracy, such as using different algorithms, types of features, and training sample size, to provide more insights into variant prioritization. We found that SGAN can achieve competitive performances with small labeled training samples by incorporating unlabeled samples, which is a unique advantage compared to traditional machine learning methods. We also found that manually curated samples can achieve a more stable predictive performance than publicly available datasets. CONCLUSIONS By incorporating much larger samples of unlabeled data, the SGAN method can improve the ability to detect novel oncogenic variants, compared to other machine-learning algorithms that use only labeled datasets. SGAN can be potentially used to predict the pathogenicity of more complex variants such as structural variants or non-coding variants, with the availability of more training samples and informative features.
Collapse
Affiliation(s)
- Zilin Ren
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Quan Li
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Kajia Cao
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Marilyn M Li
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
32
|
Babushkina NP, Kucher AN. Regulatory Potential of SNP Markers in Genes of DNA Repair Systems. Mol Biol 2023. [DOI: 10.1134/s002689332301003x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
33
|
Sneha NP, Dharshini SAP, Taguchi YH, Gromiha MM. Integrative Meta-Analysis of Huntington's Disease Transcriptome Landscape. Genes (Basel) 2022; 13:2385. [PMID: 36553652 PMCID: PMC9777612 DOI: 10.3390/genes13122385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 11/24/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
Huntington's disease (HD) is a neurodegenerative disorder with autosomal dominant inheritance caused by glutamine expansion in the Huntingtin gene (HTT). Striatal projection neurons (SPNs) in HD are more vulnerable to cell death. The executive striatal population is directly connected with the Brodmann Area (BA9), which is mainly involved in motor functions. Analyzing the disease samples from BA9 from the SRA database provides insights related to neuron degeneration, which helps to identify a promising therapeutic strategy. Most gene expression studies examine the changes in expression and associated biological functions. In this study, we elucidate the relationship between variants and their effect on gene/downstream transcript expression. We computed gene and transcript abundance and identified variants from RNA-seq data using various pipelines. We predicted the effect of genome-wide association studies (GWAS)/novel variants on regulatory functions. We found that many variants affect the histone acetylation pattern in HD, thereby perturbing the transcription factor networks. Interestingly, some variants affect miRNA binding as well as their downstream gene expression. Tissue-specific network analysis showed that mitochondrial, neuroinflammation, vasculature, and angiogenesis-related genes are disrupted in HD. From this integrative omics analysis, we propose that abnormal neuroinflammation acts as a two-edged sword that indirectly affects the vasculature and associated energy metabolism. Rehabilitation of blood-brain barrier functionality and energy metabolism may secure the neuron from cell death.
Collapse
Affiliation(s)
- Nela Pragathi Sneha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamilnadu, India
| | - S. Akila Parvathy Dharshini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamilnadu, India
| | - Y.-H. Taguchi
- Department of Physics, Chuo University, Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamilnadu, India
| |
Collapse
|
34
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
- Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
35
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
36
|
Li C, Zhi D, Wang K, Liu X. MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning. Genome Med 2022; 14:115. [PMID: 36209109 PMCID: PMC9548151 DOI: 10.1186/s13073-022-01120-z] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 09/22/2022] [Indexed: 11/22/2022] Open
Abstract
Multiple computational approaches have been developed to improve our understanding of genetic variants. However, their ability to identify rare pathogenic variants from rare benign ones is still lacking. Using context annotations and deep learning methods, we present pathogenicity prediction models, MetaRNN and MetaRNN-indel, to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs). We use independent test sets to demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. Importantly, prediction scores from both models are comparable, enabling easy adoption of integrated genotype-phenotype association analysis methods. All pre-computed nsSNV scores are available at http://www.liulab.science/MetaRNN . The stand-alone program is also available at https://github.com/Chang-Li2019/MetaRNN .
Collapse
Affiliation(s)
- Chang Li
- USF Genomics & College of Public Health, University of South Florida, 3720 Spectrum Boulevard, Suite 304, Tampa, FL 33612 USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Kai Wang
- Children’s Hospital of Philadelphia & Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Xiaoming Liu
- USF Genomics & College of Public Health, University of South Florida, 3720 Spectrum Boulevard, Suite 304, Tampa, FL 33612 USA
| |
Collapse
|
37
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
38
|
Nayara Góes de Araújo J, Fernandes de Oliveira V, Bassani Borges J, Dagli-Hernandez C, da Silva Rodrigues Marçal E, Caroline Costa de Freitas R, Medeiros Bastos G, Marques Gonçalves R, Arpad Faludi A, Elim Jannes C, da Costa Pereira A, Dominguez Crespo Hirata R, Hiroyuki Hirata M, Ducati Luchessi A, Nogueira Silbiger V. In silico analysis of upstream variants in Brazilian patients with Familial Hypercholesterolemia. Gene X 2022; 849:146908. [PMID: 36167182 DOI: 10.1016/j.gene.2022.146908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 08/16/2022] [Accepted: 09/19/2022] [Indexed: 10/14/2022] Open
Abstract
Familial hypercholesterolemia (FH) is a prevalent autosomal genetic disease associated with increased risk of early cardiovascular events and death due to chronic exposure to very high levels of low-density lipoprotein cholesterol (LDL-c). Pathogenic variants in the coding regions of LDLR, APOB and PCSK9 account for most FH cases, and variants in non-coding regions maybe involved in FH as well. Variants in the upstream region of LDLR, APOB and PCSK9 were screened by targeted next-generation sequencing and their effects were explored using in silico tools. Twenty-five patients without pathogenic variants in FH-related genes were selected. 3 kb upstream regions of LDLR, APOB and PCSK9 were sequenced using the AmpliSeq (Illumina) and Miseq Reagent Nano Kit v2 (Illumina). Sequencing data were analyzed using variant discovery and functional annotation tools. Potentially regulatory variants were selected by integrating data from public databases, published data and context-dependent regulatory prediction score. Thirty-four single nucleotide variants (SNVs) in upstream regions were identified (6 in LDLR, 15 in APOB, and 13 in PCSK9). Five SNVs were prioritized as potentially regulatory variants (rs934197, rs9282606, rs36218923, rs538300761, g.55038486A>G). APOB rs934197 was previously associated with increased rate of transcription, which in silico analysis suggests that could be due to reducing binding affinity of a transcriptional repressor. Our findings highlight the importance of variant screening outside of coding regions of all relevant genes. Further functional studies are necessary to confirm that prioritized variants could impact gene regulation and contribute to the FH phenotype.
Collapse
Affiliation(s)
- Jéssica Nayara Góes de Araújo
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil
| | - Victor Fernandes de Oliveira
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Jéssica Bassani Borges
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil; Laboratory of Molecular Research in Cardiology, Institute Dante Pazzanese of Cardiology, Sao Paulo, 04012-909, Brazil
| | - Carolina Dagli-Hernandez
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | | | - Renata Caroline Costa de Freitas
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Gisele Medeiros Bastos
- Laboratory of Molecular Research in Cardiology, Institute Dante Pazzanese of Cardiology, Sao Paulo, 04012-909, Brazil; Medical Clinic Division, Institute Dante Pazzanese of Cardiology, Sao Paulo 04012-909, Brazil
| | | | - André Arpad Faludi
- Medical Clinic Division, Institute Dante Pazzanese of Cardiology, Sao Paulo 04012-909, Brazil
| | - Cinthia Elim Jannes
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of Sao Paulo 05403-900, Brazil
| | - Alexandre da Costa Pereira
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of Sao Paulo 05403-900, Brazil
| | - Rosario Dominguez Crespo Hirata
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Mario Hiroyuki Hirata
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - André Ducati Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil; Department of Clinical and Toxicological Analyses, Federal University of Rio Grande do Norte, Natal 59012-570, Brazil
| | - Vivian Nogueira Silbiger
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil; Department of Clinical and Toxicological Analyses, Federal University of Rio Grande do Norte, Natal 59012-570, Brazil.
| |
Collapse
|
39
|
Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu WL, Lee NC, Lai F. Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e37701. [PMID: 38935959 PMCID: PMC11168239 DOI: 10.2196/37701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 07/29/2022] [Accepted: 08/22/2022] [Indexed: 06/29/2024]
Abstract
BACKGROUND In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. OBJECTIVE This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. METHODS We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model. RESULTS We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). CONCLUSIONS We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis.
Collapse
Affiliation(s)
- Yu-Shan Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Ching Hsu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yu-Chang Chune
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - I-Cheng Liao
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Hsin Wang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yi-Lin Lin
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Wuh-Liang Hwu
- Department of Pediatrics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Ni-Chung Lee
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Feipei Lai
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| |
Collapse
|
40
|
Lye Z, Choi JY, Purugganan MD. Deleterious mutations and the rare allele burden on rice gene expression. Mol Biol Evol 2022; 39:6693943. [PMID: 36073358 PMCID: PMC9512150 DOI: 10.1093/molbev/msac193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Deleterious genetic variation is maintained in populations at low frequencies. Under a model of stabilizing selection, rare (and presumably deleterious) genetic variants are associated with increase or decrease in gene expression from some intermediate optimum. We investigate this phenomenon in a population of largely Oryza sativa ssp. indica rice landraces under normal unstressed wet and stressful drought field conditions. We include single nucleotide polymorphisms, insertion/deletion mutations, and structural variants in our analysis and find a stronger association between rare variants and gene expression outliers under the stress condition. We also show an association of the strength of this rare variant effect with linkage, gene expression levels, network connectivity, local recombination rate, and fitness consequence scores, consistent with the stabilizing selection model of gene expression.
Collapse
Affiliation(s)
- Zoe Lye
- Center for Genomics and Systems Biology, New York University, New York, NY 10003
| | - Jae Young Choi
- Center for Genomics and Systems Biology, New York University, New York, NY 10003
| | - Michael D Purugganan
- Center for Genomics and Systems Biology, New York University, New York, NY 10003.,Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| |
Collapse
|
41
|
Lee B, Cyrill SL, Lee W, Melchiotti R, Andiappan AK, Poidinger M, Rötzschke O. Analysis of archaic human haplotypes suggests that 5hmC acts as an epigenetic guide for NCO recombination. BMC Biol 2022; 20:173. [PMID: 35927700 PMCID: PMC9354366 DOI: 10.1186/s12915-022-01353-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/17/2022] [Indexed: 11/17/2022] Open
Abstract
Background Non-crossover (NCO) refers to a mechanism of homologous recombination in which short tracks of DNA are copied between homologue chromatids. The allelic changes are typically restricted to one or few SNPs, which potentially allow for the gradual adaptation and maturation of haplotypes. It is assumed to be a stochastic process but the analysis of archaic and modern human haplotypes revealed a striking variability in local NCO recombination rates. Methods NCO recombination rates of 1.9 million archaic SNPs shared with Denisovan hominids were defined by a linkage study and correlated with functional and genomic annotations as well as ChIP-Seq data from modern humans. Results We detected a strong correlation between NCO recombination rates and the function of the respective region: low NCO rates were evident in introns and quiescent intergenic regions but high rates in splice sites, exons, 5′- and 3′-UTRs, as well as CpG islands. Correlations with ChIP-Seq data from ENCODE and other public sources further identified epigenetic modifications that associated directly with these recombination events. A particularly strong association was observed for 5-hydroxymethylcytosine marks (5hmC), which were enriched in virtually all of the functional regions associated with elevated NCO rates, including CpG islands and ‘poised’ bivalent regions. Conclusion Our results suggest that 5hmC marks may guide the NCO machinery specifically towards functionally relevant regions and, as an intermediate of oxidative demethylation, may open a pathway for environmental influence by specifically targeting recently opened gene loci. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01353-9.
Collapse
Affiliation(s)
- Bernett Lee
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.,Present address: Lee Kong Chian School of Medicine, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Samantha Leeanne Cyrill
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.,Present address: Cold Spring Harbor Laboratory, One Bungtown Road, NY, 11724, Cold Spring Harbor, USA
| | - Wendy Lee
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore
| | - Rossella Melchiotti
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore
| | - Anand Kumar Andiappan
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore
| | - Michael Poidinger
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.,Present address: Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, 3052, Australia
| | - Olaf Rötzschke
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.
| |
Collapse
|
42
|
Dukler N, Mughal MR, Ramani R, Huang YF, Siepel A. Extreme purifying selection against point mutations in the human genome. Nat Commun 2022; 13:4312. [PMID: 35879308 PMCID: PMC9314448 DOI: 10.1038/s41467-022-31872-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 07/07/2022] [Indexed: 12/13/2022] Open
Abstract
Large-scale genome sequencing has enabled the measurement of strong purifying selection in protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring such selection in noncoding as well as coding regions of the human genome. ExtRaINSIGHT estimates the prevalence of "ultraselection" by the fractional depletion of rare single-nucleotide variants, after controlling for variation in mutation rates. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find abundant ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. By contrast, we find much less ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest levels in ultraconserved elements. We estimate that ~0.4-0.7% of the human genome is ultraselected, implying ~ 0.26-0.51 strongly deleterious mutations per generation. Overall, our study sheds new light on the genome-wide distribution of fitness effects by combining deep sequencing data and classical theory from population genetics.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Mehreen R Mughal
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ritika Ramani
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
43
|
da Silva GM, Yang J, Leang B, Huang J, Weinreich DM, Rubenstein BM. Covalent docking and molecular dynamics simulations reveal the specificity-shifting mutations Ala237Arg and Ala237Lys in TEM beta-lactamase. PLoS Comput Biol 2022; 18:e1009944. [PMID: 35759512 PMCID: PMC9269908 DOI: 10.1371/journal.pcbi.1009944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 07/08/2022] [Accepted: 06/01/2022] [Indexed: 11/18/2022] Open
Abstract
The rate of modern drug discovery using experimental screening methods still lags behind the rate at which pathogens mutate, underscoring the need for fast and accurate predictive simulations of protein evolution. Multidrug-resistant bacteria evade our defenses by expressing a series of proteins, the most famous of which is the 29-kilodalton enzyme, TEM β-lactamase. Considering these challenges, we applied a covalent docking heuristic to measure the effects of all possible alanine 237 substitutions in TEM due to this codon's importance for catalysis and effects on the binding affinities of commercially-available β-lactam compounds. In addition to the usual mutations that reduce substrate binding due to steric hindrance, we identified two distinctive specificity-shifting TEM mutations, Ala237Arg and Ala237Lys, and their respective modes of action. Notably, we discovered and verified through minimum inhibitory concentration assays that, while these mutations and their bulkier side chains lead to steric clashes that curtail ampicillin binding, these same groups foster salt bridges with the negatively-charged side-chain of the cephalosporin cefixime, widely used in the clinic to treat multi-resistant bacterial infections. To measure the stability of these unexpected interactions, we used molecular dynamics simulations and found the binding modes to be stable despite the application of biasing forces. Finally, we found that both TEM mutants also bind strongly to other drugs containing negatively-charged R-groups, such as carumonam and ceftibuten. As with cefixime, this increased binding affinity stems from a salt bridge between the compounds' negative moieties and the positively-charged side chain of the arginine or lysine, suggesting a shared mechanism. In addition to reaffirming the power of using simulations as molecular microscopes, our results can guide the rational design of next-generation β-lactam antibiotics and bring the community closer to retaking the lead against the recurrent threat of multidrug-resistant pathogens.
Collapse
Affiliation(s)
- Gabriel Monteiro da Silva
- Department of Molecular and Cell Biology, Brown University, Providence, Rhode Island, United States of America
| | - Jordan Yang
- Department of Chemistry, Brown University, Providence, Rhode Island, United States of America
| | - Bunlong Leang
- Department of Health and Human Biology, Brown University, Providence, Rhode Island, United States of America
| | - Jessie Huang
- Department of Chemistry, Wellesley College, Wellesley, Massachusetts, United States of America
| | - Daniel M. Weinreich
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
| | - Brenda M. Rubenstein
- Department of Chemistry, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
44
|
Kuru N, Dereli O, Akkoyun E, Bircan A, Tastan O, Adebali O. PHACT: Phylogeny-aware computing of tolerance for missense mutations. Mol Biol Evol 2022; 39:6593375. [PMID: 35639618 PMCID: PMC9178230 DOI: 10.1093/molbev/msac114] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Evolutionary conservation is a fundamental resource for predicting the substitutability of amino acids and loss of function in proteins. The use of multiple sequence alignment alone-without considering the evolutionary relationships among sequences-results in the redundant counting of evolutionarily related alteration events as if they were independent. Here we propose a new method, PHACT that predicts the pathogenicity of missense mutations directly from the phylogenetic tree of proteins. PHACT travels through the nodes of the phylogenetic tree and evaluates the deleteriousness of a substitution based on the probability differences of ancestral amino acids between neighboring nodes in the tree. Moreover, PHACT assigns weights to each node in the tree based on their distance to the query organism. For each potential amino acid substitution, the algorithm generates a score that is used to calculate the effect of substitution on protein function. To analyze the predictive performance of PHACT, we performed various experiments over the subsets of two datasets that include 3023 proteins and 61662 variants in total. The experiments demonstrated that our method outperformed the widely used pathogenicity prediction tools (i.e., SIFT and PolyPhen-2) and achieved better predictive performance than did other conventional statistical approaches presented in dbNSFP. The PHACT source code is available at https://github.com/CompGenomeLab/PHACT.
Collapse
Affiliation(s)
- Nurdan Kuru
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Onur Dereli
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Emrah Akkoyun
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Aylin Bircan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Ogun Adebali
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
45
|
Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics 2022; 38:3164-3172. [PMID: 35389435 PMCID: PMC9890318 DOI: 10.1093/bioinformatics/btac214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. RESULTS We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TLVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Chen
- To whom correspondence should be addressed.
| | | | - Fengdi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
46
|
Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, Taylor DJ, Shafin K, Shumate A, Xiao C, Wagner J, McDaniel J, Olson ND, Sauria MEG, Vollger MR, Rhie A, Meredith M, Martin S, Lee J, Koren S, Rosenfeld JA, Paten B, Layer R, Chin CS, Sedlazeck FJ, Hansen NF, Miller DE, Phillippy AM, Miga KH, McCoy RC, Dennis MY, Zook JM, Schatz MC. A complete reference genome improves analysis of human genetic variation. Science 2022; 376:eabl3533. [PMID: 35357935 PMCID: PMC9336181 DOI: 10.1126/science.abl3533] [Citation(s) in RCA: 193] [Impact Index Per Article: 64.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C. Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel Avdeyev
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Dylan J. Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Justin Wagner
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D. Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Melissa Meredith
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Skylar Martin
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | - Sergey Koren
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ryan Layer
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, National Human Genome Research Institute, Rockville, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital, Seattle, WA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Justin M. Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
47
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
48
|
Roca-Umbert A, Caro-Consuegra R, Londono-Correa D, Rodriguez-Lozano GF, Vicente R, Bosch E. Understanding signatures of positive natural selection in human zinc transporter genes. Sci Rep 2022; 12:4320. [PMID: 35279701 PMCID: PMC8918337 DOI: 10.1038/s41598-022-08439-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 02/25/2022] [Indexed: 12/11/2022] Open
Abstract
Zinc is an essential micronutrient with a tightly regulated systemic and cellular homeostasis. In humans, some zinc transporter genes (ZTGs) have been previously reported as candidates for strong geographically restricted selective sweeps. However, since zinc homeostasis is maintained by the joint action of 24 ZTGs, other more subtle modes of selection could have also facilitated human adaptation to zinc availability. Here, we studied whether the complete set of ZTGs are enriched for signals of positive selection in worldwide populations and population groups from South Asia. ZTGs showed higher levels of genetic differentiation between African and non-African populations than would be randomly expected, as well as other signals of polygenic selection outside Africa. Moreover, in several South Asian population groups, ZTGs were significantly enriched for SNPs with unusually extended haplotypes and displayed SNP genotype-environmental correlations when considering zinc deficiency levels in soil in that geographical area. Our study replicated some well-characterized targets for positive selection in East Asia and sub-Saharan Africa, and proposes new candidates for follow-up in South Asia (SLC39A5) and Africa (SLC39A7). Finally, we identified candidate variants for adaptation in ZTGs that could contribute to different disease susceptibilities and zinc-related human health traits.
Collapse
Affiliation(s)
- Ana Roca-Umbert
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain
| | - Rocio Caro-Consuegra
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain
| | - Diego Londono-Correa
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain
| | - Gabriel Felipe Rodriguez-Lozano
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain
| | - Ruben Vicente
- Laboratory of Molecular Physiology, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain
| | - Elena Bosch
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain. .,Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), 43206, Reus, Spain.
| |
Collapse
|
49
|
Zhou J, Wu L, Xu P, Li Y, Ji Z, Kang X. Filamin A Is a Potential Driver of Breast Cancer Metastasis via Regulation of MMP-1. Front Oncol 2022; 12:836126. [PMID: 35359350 PMCID: PMC8962737 DOI: 10.3389/fonc.2022.836126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/14/2022] [Indexed: 01/01/2023] Open
Abstract
Recurrent metastasis is a major fatal cause of breast cancer. Regretfully, the driving force and the molecular beneath have not been fully illustrated yet. In this study, a cohort of breast cancer patients with locoregional metastasis was recruited. For them, we collected the matched samples of the primary tumor and metastatic tumor, and then we determined the mutation profiles with whole-exome sequencing (WES). On basis of the profiles, we identified a list of deleterious variants in eight susceptible genes. Of them, filamin A (FLNA) was considered a potential driver gene of metastasis, and its low expression could enhance 5 years’ relapse survival rate by 15%. To prove the finding, we constructed a stable FLNA knockout tumor cell line, which manifested that the cell abilities of proliferation, migration, and invasion were significantly weakened in response to the gene knockout. Subsequently, xenograft mouse experiments further proved that FLNA knockout could inhibit local or distal metastasis. Putting all the results together, we consolidated that FLNA could be a potential driver gene to metastasis of breast cancer, in particular triple-negative breast cancer. Additional experiments also suggested that FLNA might intervene in metastasis via the regulation of MMP-1 expression. In summary, this study demonstrates that FLNA may play as a positive regulator in cancer proliferation and recurrence. It provides new insight into breast cancer metastasis and suggests a potential new therapeutic target for breast cancer therapy.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Oncology, Xiang’an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China
| | - Lvying Wu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, China
| | - Pengyan Xu
- Department of Surgical Research, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Yue Li
- Department of Oncology, Xiang’an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China
| | - Zhiliang Ji
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, China
- *Correspondence: Xinmei Kang, ; Zhiliang Ji,
| | - Xinmei Kang
- Department of Oncology, Xiang’an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China
- *Correspondence: Xinmei Kang, ; Zhiliang Ji,
| |
Collapse
|
50
|
Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity. Am J Hum Genet 2022; 109:457-470. [PMID: 35120630 PMCID: PMC8948164 DOI: 10.1016/j.ajhg.2022.01.006] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/11/2022] [Indexed: 12/11/2022] Open
Abstract
We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.
Collapse
|