1
|
Chen Y, Lee K, Woo J, Kim DW, Keum C, Babbi G, Casadio R, Martelli PL, Savojardo C, Manfredi M, Shen Y, Sun Y, Katsonis P, Lichtarge O, Pejaver V, Seward DJ, Kamandula A, Bakolitsa C, Brenner SE, Radivojac P, O’Donnell-Luria A, Mooney SD, Jain S. Evaluating predictors of kinase activity of STK11 variants identified in primary human non-small cell lung cancers. RESEARCH SQUARE 2024:rs.3.rs-4587317. [PMID: 39011112 PMCID: PMC11247923 DOI: 10.21203/rs.3.rs-4587317/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, a dataset of 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified in primary non-small cell lung cancer biopsies, was experimentally assayed to characterize computational methods from four participating teams and five publicly available tools. Predictors demonstrated a high level of performance on key evaluation metrics, measuring correlation with the assay outputs and separating loss-of-function (LoF) variants from wildtype-like (WT-like) variants. The best participant model, 3Cnet, performed competitively with well-known tools. Unique to this challenge was that the functional data was generated with both biological and technical replicates, thus allowing the assessors to realistically establish maximum predictive performance based on experimental variability. Three out of the five publicly available tools and 3Cnet approached the performance of the assay replicates in separating LoF variants from WT-like variants. Surprisingly, REVEL, an often-used model, achieved a comparable correlation with the real-valued assay output as that seen for the experimental replicates. Performing variant interpretation by combining the new functional evidence with computational and population data evidence led to 16 new variants receiving a clinically actionable classification of likely pathogenic (LP) or likely benign (LB). Overall, the STK11 challenge highlights the utility of variant effect predictors in biomedical sciences and provides encouraging results for driving research in the field of computational genome interpretation.
Collapse
Affiliation(s)
- Yile Chen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, 98105, WA, USA
| | - Kyoungyeul Lee
- 3billion, 3billion Biotechnology company, Seoul, South Korea
| | - Junwoo Woo
- 3billion, 3billion Biotechnology company, Seoul, South Korea
| | - Dong-wook Kim
- 3billion, 3billion Biotechnology company, Seoul, South Korea
| | - Changwon Keum
- 3billion, 3billion Biotechnology company, Seoul, South Korea
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Matteo Manfredi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA
| | - Panagiotis Katsonis
- Molecular and Human Genetics, Baylor College of Medicine, Houston, 77030, TX, USA
| | - Olivier Lichtarge
- Molecular and Human Genetics, Baylor College of Medicine, Houston, 77030, TX, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - David J. Seward
- Department of Pathology, University of Vermont, Burlington, 5445, VT, USA
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, 02115, MA, USA
| | | | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, 02115, MA, USA
| | - Anne O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, 02115, MA, USA
- Broad Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, 02142, MA, USA
| | - Sean D. Mooney
- Center for Information Technology, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Shantanu Jain
- Khoury College of Computer Sciences, Northeastern University, Boston, 02115, MA, USA
- The Institute for Experiential AI, Northeastern University, Boston, 02115, MA, USA
| |
Collapse
|
2
|
Ohno S, Manabe N, Yamaguchi Y. Prediction of protein structure and AI. J Hum Genet 2024:10.1038/s10038-023-01215-4. [PMID: 38177398 DOI: 10.1038/s10038-023-01215-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 12/10/2023] [Indexed: 01/06/2024]
Abstract
AlphaFold, an artificial intelligence (AI)-based tool for predicting the 3D structure of proteins, is now widely recognized for its high accuracy and versatility in the folding of human proteins. AlphaFold is useful for understanding structure-function relationships from protein 3D structure models and can serve as a template or a reference for experimental structural analysis including X-ray crystallography, NMR and cryo-EM analysis. Its use is expanding among researchers, not only in structural biology but also in other research fields. Researchers are currently exploring the full potential of AlphaFold-generated protein models. Predicting disease severity caused by missense mutations is one such application. This article provides an overview of the 3D structural modeling of AlphaFold based on deep learning techniques and highlights the challenges in predicting the pathogenicity of missense mutations.
Collapse
Affiliation(s)
- Shiho Ohno
- Division of Structural Glycobiology, Institute of Molecular Biomembrane and Glycobiology, Tohoku Medical and Pharmaceutical University, 4-4-1 Komatsushima, Aoba-ku, Sendai, Miyagi, 981-8558, Japan
| | - Noriyoshi Manabe
- Division of Structural Glycobiology, Institute of Molecular Biomembrane and Glycobiology, Tohoku Medical and Pharmaceutical University, 4-4-1 Komatsushima, Aoba-ku, Sendai, Miyagi, 981-8558, Japan
| | - Yoshiki Yamaguchi
- Division of Structural Glycobiology, Institute of Molecular Biomembrane and Glycobiology, Tohoku Medical and Pharmaceutical University, 4-4-1 Komatsushima, Aoba-ku, Sendai, Miyagi, 981-8558, Japan.
| |
Collapse
|
3
|
Shauli T, Brandes N, Linial M. Evolutionary and functional lessons from human-specific amino acid substitution matrices. NAR Genom Bioinform 2021; 3:lqab079. [PMID: 34541526 PMCID: PMC8445205 DOI: 10.1093/nargab/lqab079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/02/2021] [Accepted: 09/14/2021] [Indexed: 12/26/2022] Open
Abstract
Human genetic variation in coding regions is fundamental to the study of protein structure and function. Most methods for interpreting missense variants consider substitution measures derived from homologous proteins across different species. In this study, we introduce human-specific amino acid (AA) substitution matrices that are based on genetic variations in the modern human population. We analyzed the frequencies of >4.8M single nucleotide variants (SNVs) at codon and AA resolution and compiled human-centric substitution matrices that are fundamentally different from classic cross-species matrices (e.g. BLOSUM, PAM). Our matrices are asymmetric, with some AA replacements showing significant directional preference. Moreover, these AA matrices are only partly predicted by nucleotide substitution rates. We further test the utility of our matrices in exposing functional signals of experimentally-validated protein annotations. A significant reduction in AA transition frequencies was observed across nine post-translational modification (PTM) types and four ion-binding sites. Our results propose a purifying selection signal in the human proteome across a diverse set of functional protein annotations and provide an empirical baseline for interpreting human genetic variation in coding regions.
Collapse
Affiliation(s)
- Tair Shauli
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| |
Collapse
|
4
|
Zhou JB, Xiong Y, An K, Ye ZQ, Wu YD. IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions. Bioinformatics 2021; 36:4977-4983. [PMID: 32756939 PMCID: PMC7755418 DOI: 10.1093/bioinformatics/btaa618] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 06/28/2020] [Accepted: 07/01/2020] [Indexed: 01/09/2023] Open
Abstract
Motivation Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. Results We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. Availability and implementation The software is freely available at http://www.wdspdb.com/IDRMutPred. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing-Bo Zhou
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Yao Xiong
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Ke An
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Zhi-Qiang Ye
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yun-Dong Wu
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| |
Collapse
|
5
|
Laaksonen J, Mishra PP, Seppälä I, Lyytikäinen LP, Raitoharju E, Mononen N, Lepistö M, Almusa H, Ellonen P, Hutri-Kähönen N, Juonala M, Raitakari O, Kähönen M, Salonen JT, Lehtimäki T. Examining the effect of mitochondrial DNA variants on blood pressure in two Finnish cohorts. Sci Rep 2021; 11:611. [PMID: 33436758 PMCID: PMC7804469 DOI: 10.1038/s41598-020-79931-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 12/10/2020] [Indexed: 12/12/2022] Open
Abstract
High blood pressure (BP) is a major risk factor for many noncommunicable diseases. The effect of mitochondrial DNA single-nucleotide polymorphisms (mtSNPs) on BP is less known than that of nuclear SNPs. We investigated the mitochondrial genetic determinants of systolic, diastolic, and mean arterial BP. MtSNPs were determined from peripheral blood by sequencing or with genome-wide association study SNP arrays in two independent Finnish cohorts, the Young Finns Study and the Finnish Cardiovascular Study, respectively. In total, over 4200 individuals were included. The effects of individual common mtSNPs, with an additional focus on sex-specificity, and aggregates of rare mtSNPs grouped by mitochondrial genes were evaluated by meta-analysis of linear regression and a sequence kernel association test, respectively. We accounted for the predicted pathogenicity of the rare variants within protein-encoding and the tRNA regions. In the meta-analysis of 87 common mtSNPs, we did not observe significant associations with any of the BP traits. Sex-specific and rare-variant analyses did not pinpoint any significant associations either. Our results are in agreement with several previous studies suggesting that mtDNA variation does not have a significant role in the regulation of BP. Future studies might need to reconsider the mechanisms thought to link mtDNA with hypertension.
Collapse
Affiliation(s)
- Jaakko Laaksonen
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland.
| | - Pashupati P Mishra
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Ilkka Seppälä
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Leo-Pekka Lyytikäinen
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Emma Raitoharju
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Nina Mononen
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Maija Lepistö
- Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland
| | - Henrikki Almusa
- Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland
| | - Pekka Ellonen
- Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland
| | - Nina Hutri-Kähönen
- Department of Paediatrics, Tampere University Hospital and Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Markus Juonala
- Department of Medicine, University of Turku, Turku, Finland.,Division of Medicine, Turku University Hospital, Turku, Finland.,Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland.,Research Centre for Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland.,Department of Clinical Physiology and Nuclear Medicine, University of Turku and Turku University Hospital, Turku, Finland
| | - Mika Kähönen
- Department of Clinical Physiology, Tampere University Hospital and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Jukka T Salonen
- Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,MAS-Metabolic Analytical Services Oy, Helsinki, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| |
Collapse
|
6
|
OXPHOS remodeling in high-grade prostate cancer involves mtDNA mutations and increased succinate oxidation. Nat Commun 2020; 11:1487. [PMID: 32198407 PMCID: PMC7083862 DOI: 10.1038/s41467-020-15237-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 02/25/2020] [Indexed: 02/07/2023] Open
Abstract
Rewiring of energy metabolism and adaptation of mitochondria are considered to impact on prostate cancer development and progression. Here, we report on mitochondrial respiration, DNA mutations and gene expression in paired benign/malignant human prostate tissue samples. Results reveal reduced respiratory capacities with NADH-pathway substrates glutamate and malate in malignant tissue and a significant metabolic shift towards higher succinate oxidation, particularly in high-grade tumors. The load of potentially deleterious mitochondrial-DNA mutations is higher in tumors and associated with unfavorable risk factors. High levels of potentially deleterious mutations in mitochondrial Complex I-encoding genes are associated with a 70% reduction in NADH-pathway capacity and compensation by increased succinate-pathway capacity. Structural analyses of these mutations reveal amino acid alterations leading to potentially deleterious effects on Complex I, supporting a causal relationship. A metagene signature extracted from the transcriptome of tumor samples exhibiting a severe mitochondrial phenotype enables identification of tumors with shorter survival times.
Collapse
|
7
|
Fert-Bober J, Murray CI, Parker SJ, Van Eyk JE. Precision Profiling of the Cardiovascular Post-Translationally Modified Proteome: Where There Is a Will, There Is a Way. Circ Res 2019; 122:1221-1237. [PMID: 29700069 DOI: 10.1161/circresaha.118.310966] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
There is an exponential increase in biological complexity as initial gene transcripts are spliced, translated into amino acid sequence, and post-translationally modified. Each protein can exist as multiple chemical or sequence-specific proteoforms, and each has the potential to be a critical mediator of a physiological or pathophysiological signaling cascade. Here, we provide an overview of how different proteoforms come about in biological systems and how they are most commonly measured using mass spectrometry-based proteomics and bioinformatics. Our goal is to present this information at a level accessible to every scientist interested in mass spectrometry and its application to proteome profiling. We will specifically discuss recent data linking various protein post-translational modifications to cardiovascular disease and conclude with a discussion for enablement and democratization of proteomics across the cardiovascular and scientific community. The aim is to inform and inspire the readership to explore a larger breadth of proteoform, particularity post-translational modifications, related to their particular areas of expertise in cardiovascular physiology.
Collapse
Affiliation(s)
- Justyna Fert-Bober
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Christopher I Murray
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Sarah J Parker
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA.
| | - Jennifer E Van Eyk
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| |
Collapse
|
8
|
Pagel KA, Antaki D, Lian A, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome. PLoS Comput Biol 2019; 15:e1007112. [PMID: 31199787 PMCID: PMC6594643 DOI: 10.1371/journal.pcbi.1007112] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 06/26/2019] [Accepted: 05/17/2019] [Indexed: 11/19/2022] Open
Abstract
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.
Collapse
Affiliation(s)
- Kymberleigh A. Pagel
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
| | - Danny Antaki
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - AoJie Lian
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Lilia M. Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| | - Predrag Radivojac
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
9
|
Dobson L, Mészáros B, Tusnády GE. Structural Principles Governing Disease-Causing Germline Mutations. J Mol Biol 2018; 430:4955-4970. [DOI: 10.1016/j.jmb.2018.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 10/11/2018] [Indexed: 01/03/2023]
|
10
|
Pejaver V, Mooney SD, Radivojac P. Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges. Hum Mutat 2017; 38:1092-1108. [PMID: 28508593 PMCID: PMC5561458 DOI: 10.1002/humu.23258] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 03/16/2017] [Accepted: 03/26/2017] [Indexed: 11/08/2022]
Abstract
The steady advances in machine learning and accumulation of biomedical data have contributed to the development of numerous computational models that assess the impact of missense variants. Different methods, however, operationalize impact differently. Two common tasks in this context are the prediction of the pathogenicity of variants and the prediction of their effects on a protein's function. These are related but distinct problems, and it is unclear whether methods developed for one are optimized for the other. The Critical Assessment of Genome Interpretation (CAGI) experiment provides a means to address this question empirically. To this end, we participated in various protein-specific challenges in CAGI with two objectives in mind. First, to compare the performance of methods in the MutPred family with the state-of-the-art. Second and more importantly, to investigate the applicability of general-purpose pathogenicity predictors to the classification of specific function-altering variants without additional training or calibration. We find that our pathogenicity predictors performed competitively with other methods, outputting score distributions in agreement with experimental outcomes. Overall, we conclude that binary classifiers learned from disease-causing mutations are capable of modeling important aspects of the underlying biology and the alteration of protein function resulting from mutations.
Collapse
Affiliation(s)
- Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana 47405
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington 98109
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana 47405
| |
Collapse
|
11
|
Ittisoponpisan S, Alhuzimi E, Sternberg MJE, David A. Landscape of Pleiotropic Proteins Causing Human Disease: Structural and System Biology Insights. Hum Mutat 2017; 38:289-296. [PMID: 27957775 DOI: 10.1002/humu.23155] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 12/03/2016] [Indexed: 12/13/2022]
Abstract
Pleiotropy is the phenomenon by which the same gene can result in multiple phenotypes. Pleiotropic proteins are emerging as important contributors to rare and common disorders. Nevertheless, little is known on the mechanisms underlying pleiotropy and the characteristic of pleiotropic proteins. We analyzed disease-causing proteins reported in UniProt and observed that 12% are pleiotropic (variants in the same protein cause more than one disease). Pleiotropic proteins were enriched in deleterious and rare variants, but not in common variants. Pleiotropic proteins were more likely to be involved in the pathogenesis of neoplasms, neurological, and circulatory diseases and congenital malformations, whereas non-pleiotropic proteins in endocrine and metabolic disorders. Pleiotropic proteins were more essential and had a higher number of interacting partners compared with non-pleiotropic proteins. Significantly more pleiotropic than non-pleiotropic proteins contained at least one intrinsically long disordered region (P < 0.001). Deleterious variants occurring in structurally disordered regions were more commonly found in pleiotropic, rather than non-pleiotropic proteins. In conclusion, pleiotropic proteins are an important contributor to human disease. They represent a biologically different class of proteins compared with non-pleiotropic proteins and a better understanding of their characteristics and genetic variants can greatly aid in the interpretation of genetic studies and drug design.
Collapse
Affiliation(s)
- Sirawit Ittisoponpisan
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| | - Eman Alhuzimi
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| | - Alessia David
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| |
Collapse
|
12
|
Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016; 590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]
Abstract
Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both.
Collapse
Affiliation(s)
- Burkhard Rost
- Department of Informatics and Bioinformatics, Institute for Advanced Studies, Technical University of Munich, Garching, Germany
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
13
|
Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinformatics 2016; 17:250. [PMID: 27333889 PMCID: PMC4918084 DOI: 10.1186/s12859-016-1080-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 05/11/2016] [Indexed: 01/12/2023] Open
Abstract
Background Identification of associations between marketed drugs and adverse events from the biomedical literature assists drug safety monitoring efforts. Assessing the significance of such literature-derived associations and determining the granularity at which they should be captured remains a challenge. Here, we assess how defining a selection of adverse event terms from MeSH, based on information content, can improve the detection of adverse events for drugs and drug classes. Results We analyze a set of 105,354 candidate drug adverse event pairs extracted from article indexes in MEDLINE. First, we harmonize extracted adverse event terms by aggregating them into higher-level MeSH terms based on the terms’ information content. Then, we determine statistical enrichment of adverse events associated with drug and drug classes using a conditional hypergeometric test that adjusts for dependencies among associated terms. We compare our results with methods based on disproportionality analysis (proportional reporting ratio, PRR) and quantify the improvement in signal detection with our generalized enrichment analysis (GEA) approach using a gold standard of drug-adverse event associations spanning 174 drugs and four events. For single drugs, the best GEA method (Precision: .92/Recall: .71/F1-measure: .80) outperforms the best PRR based method (.69/.69/.69) on all four adverse event outcomes in our gold standard. For drug classes, our GEA performs similarly (.85/.69/.74) when increasing the level of abstraction for adverse event terms. Finally, on examining the 1609 individual drugs in our MEDLINE set, which map to chemical substances in ATC, we find signals for 1379 drugs (10,122 unique adverse event associations) on applying GEA with p < 0.005. Conclusions We present an approach based on generalized enrichment analysis that can be used to detect associations between drugs, drug classes and adverse events at a given level of granularity, at the same time correcting for known dependencies among events. Our study demonstrates the use of GEA, and the importance of choosing appropriate abstraction levels to complement current drug safety methods. We provide an R package for exploration of alternative abstraction levels of adverse event terms based on information content. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1080-z) contains supplementary material, which is available to authorized users.
Collapse
|
14
|
Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2015; 17:841-62. [PMID: 26494363 DOI: 10.1093/bib/bbv084] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 12/20/2022] Open
Abstract
Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.
Collapse
|
15
|
David A, Sternberg MJE. The Contribution of Missense Mutations in Core and Rim Residues of Protein-Protein Interfaces to Human Disease. J Mol Biol 2015; 427:2886-98. [PMID: 26173036 PMCID: PMC4548493 DOI: 10.1016/j.jmb.2015.07.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 06/19/2015] [Accepted: 07/06/2015] [Indexed: 01/21/2023]
Abstract
Missense mutations at protein–protein interaction sites, called interfaces, are important contributors to human disease. Interfaces are non-uniform surface areas characterized by two main regions, “core” and “rim”, which differ in terms of evolutionary conservation and physicochemical properties. Moreover, within interfaces, only a small subset of residues (“hot spots”) is crucial for the binding free energy of the protein–protein complex. We performed a large-scale structural analysis of human single amino acid variations (SAVs) and demonstrated that disease-causing mutations are preferentially located within the interface core, as opposed to the rim (p < 0.01). In contrast, the interface rim is significantly enriched in polymorphisms, similar to the remaining non-interacting surface. Energetic hot spots tend to be enriched in disease-causing mutations compared to non-hot spots (p = 0.05), regardless of their occurrence in core or rim residues. For individual amino acids, the frequency of substitution into a polymorphism or disease-causing mutation differed to other amino acids and was related to its structural location, as was the type of physicochemical change introduced by the SAV. In conclusion, this study demonstrated the different distribution and properties of disease-causing SAVs and polymorphisms within different structural regions and in relation to the energetic contribution of amino acid in protein–protein interfaces, thus highlighting the importance of a structural system biology approach for predicting the effect of SAVs. Protein–protein interactions are fundamental in all biological processes. The distribution of deleterious and non-SAVs within protein interfaces is unknown. The distribution of deleterious SAVs differs within different interface structural regions. The distribution of SAVs differs in relation to interface residues energetic contribution. Structural analysis of protein complexes enhances the understanding of deleterious SAVs.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, SW7 2AZ London, United Kingdom.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, SW7 2AZ London, United Kingdom.
| |
Collapse
|
16
|
Pereira L, Soares P, Triska P, Rito T, van der Waerden A, Li B, Radivojac P, Samuels DC. Global human frequencies of predicted nuclear pathogenic variants and the role played by protein hydrophobicity in pathogenicity potential. Sci Rep 2014; 4:7155. [PMID: 25412673 PMCID: PMC4239565 DOI: 10.1038/srep07155] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Accepted: 11/05/2014] [Indexed: 11/30/2022] Open
Abstract
Mitochondrial proteins are coded by nuclear (nDNA) and mitochondrial (mtDNA) genes, implying a complex cross-talk between the two genomes. Here we investigated the diversity displayed in 104 nuclear-coded mitochondrial proteins from 1,092 individuals from the 1000 Genomes dataset, in order to evaluate if these genes are under the effects of purifying selection and how that selection compares with their mitochondrial encoded counterparts. Only the very rare variants (frequency < 0.1%) in these nDNA genes are indistinguishable from a random set from all possible variants in terms of predicted pathogenicity score, but more frequent variants display distinct signs of purifying selection. Comparisons of selection strength indicate stronger selection in the mtDNA genes compared to this set of nDNA genes, accounted for by the high hydrophobicity of the proteins coded by the mtDNA. Most of the predicted pathogenic variants in the nDNA genes were restricted to a single continental population. The proportion of individuals having at least one potential pathogenic mutation in this gene set was significantly lower in Europeans than in Africans and Asians. This difference may reflect demographic asymmetries, since African and Asian populations experienced main expansions in middle Holocene, while in Europeans the main expansions occurred earlier in the post-glacial period.
Collapse
Affiliation(s)
- Luísa Pereira
- 1] Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto 4200-465, Portugal [2] Faculdade de Medicina da Universidade do Porto, Porto 4200-319, Portugal
| | - Pedro Soares
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto 4200-465, Portugal
| | - Petr Triska
- 1] Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto 4200-465, Portugal [2] Instituto de Ciências Biomédicas da Universidade do Porto (ICBAS), Porto 4050-313, Portugal
| | - Teresa Rito
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto 4200-465, Portugal
| | - Agnes van der Waerden
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto 4200-465, Portugal
| | - Biao Li
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - David C Samuels
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN 37232-0700, USA
| |
Collapse
|
17
|
Pan Y, Karagiannis K, Zhang H, Dingerdissen H, Shamsaddini A, Wan Q, Simonyan V, Mazumder R. Human germline and pan-cancer variomes and their distinct functional profiles. Nucleic Acids Res 2014; 42:11570-88. [PMID: 25232094 PMCID: PMC4191387 DOI: 10.1093/nar/gku772] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations.
Collapse
Affiliation(s)
- Yang Pan
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA
| | - Konstantinos Karagiannis
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA
| | - Haichen Zhang
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA
| | - Hayley Dingerdissen
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA
| | - Amirhossein Shamsaddini
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA
| | - Quan Wan
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA
| | - Vahan Simonyan
- Center for Biologics Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD 20993, USA
| | - Raja Mazumder
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| |
Collapse
|
18
|
Finlayson SG, LePendu P, Shah NH. Building the graph of medicine from millions of clinical narratives. Sci Data 2014; 1:140032. [PMID: 25977789 PMCID: PMC4322575 DOI: 10.1038/sdata.2014.32] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 08/18/2014] [Indexed: 01/08/2023] Open
Abstract
Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. We provide a unique set of co-occurrence matrices, quantifying the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts, calculated from the raw text of 20 million clinical notes spanning 19 years of data. Co-frequencies were computed by means of a parallelized annotation, hashing, and counting pipeline that was applied over clinical notes from Stanford Hospitals and Clinics. The co-occurrence matrix quantifies the relatedness among medical concepts which can serve as the basis for many statistical tests, and can be used to directly compute Bayesian conditional probabilities, association rules, as well as a range of test statistics such as relative risks and odds ratios. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications.
Collapse
Affiliation(s)
- Samuel G. Finlayson
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA
| | - Paea LePendu
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA
| | - Nigam H. Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
19
|
Shihab HA, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR. Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum Genomics 2014; 8:11. [PMID: 24980617 PMCID: PMC4083756 DOI: 10.1186/1479-7364-8-11] [Citation(s) in RCA: 130] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 06/21/2014] [Indexed: 11/10/2022] Open
Abstract
As the number of non-synonymous single nucleotide polymorphisms (nsSNPs) identified through whole-exome/whole-genome sequencing programs increases, researchers and clinicians are becoming increasingly reliant upon computational prediction algorithms designed to prioritize potential functional variants for further study. A large proportion of existing prediction algorithms are 'disease agnostic' but are nevertheless quite capable of predicting when a mutation is likely to be deleterious. However, most clinical and research applications of these algorithms relate to specific diseases and would therefore benefit from an approach that discriminates between functional variants specifically related to that disease from those which are not. In a whole-exome/whole-genome sequencing context, such an approach could substantially reduce the number of false positive candidate mutations. Here, we test this postulate by incorporating a disease-specific weighting scheme into the Functional Analysis through Hidden Markov Models (FATHMM) algorithm. When compared to traditional prediction algorithms, we observed an overall reduction in the number of false positives identified using a disease-specific approach to functional prediction across 17 distinct disease concepts/categories. Our results illustrate the potential benefits of making disease-specific predictions when prioritizing candidate variants in relation to specific diseases. A web-based implementation of our algorithm is available at http://fathmm.biocompute.org.uk.
Collapse
Affiliation(s)
| | | | | | | | | | - Tom R Gaunt
- Bristol Centre for Systems Biomedicine and MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.
| |
Collapse
|
20
|
Vihinen M. Majority vote and other problems when using computational tools. Hum Mutat 2014; 35:912-4. [PMID: 24915749 DOI: 10.1002/humu.22600] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 05/28/2014] [Indexed: 11/06/2022]
Abstract
Computational tools are essential for most of our research. To use these tools, one needs to know how they work. Problems in application of computational methods to variation analysis can appear at several stages and affect, for example, the interpretation of results. Such cases are discussed along with suggestions how to avoid them. The applications include incomplete reporting of methods, especially about the use of prediction tools; method selection on unscientific grounds and without consulting independent method performance assessments; extending application area of methods outside their intended purpose; use of the same data several times for obtaining majority vote; and filtering of datasets so that variants of interest are excluded. All these issues can be avoided by discontinuing the use software tools as black boxes.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC D10, Lund University, Lund, Sweden
| |
Collapse
|
21
|
Ali H, Urolagin S, Gurarslan Ö, Vihinen M. Performance of Protein Disorder Prediction Programs on Amino Acid Substitutions. Hum Mutat 2014; 35:794-804. [DOI: 10.1002/humu.22564] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/04/2014] [Indexed: 01/04/2023]
Affiliation(s)
- Heidi Ali
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
| | - Siddhaling Urolagin
- Department of Experimental Medical Science; Lund University; SE-22184 Lund Sweden
| | - Ömer Gurarslan
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
| | - Mauno Vihinen
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
- Department of Experimental Medical Science; Lund University; SE-22184 Lund Sweden
- Tampere University Hospital; Tampere Finland
| |
Collapse
|
22
|
Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol 2013; 425:4047-63. [PMID: 23962656 PMCID: PMC3807015 DOI: 10.1016/j.jmb.2013.08.008] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/07/2013] [Accepted: 08/08/2013] [Indexed: 12/26/2022]
Abstract
Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Emily Doughty
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
23
|
Patnala R, Clements J, Batra J. Candidate gene association studies: a comprehensive guide to useful in silico tools. BMC Genet 2013; 14:39. [PMID: 23656885 PMCID: PMC3655892 DOI: 10.1186/1471-2156-14-39] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 04/15/2013] [Indexed: 01/01/2023] Open
Abstract
The candidate gene approach has been a pioneer in the field of genetic epidemiology, identifying risk alleles and their association with clinical traits. With the advent of rapidly changing technology, there has been an explosion of in silico tools available to researchers, giving them fast, efficient resources and reliable strategies important to find casual gene variants for candidate or genome wide association studies (GWAS). In this review, following a description of candidate gene prioritisation, we summarise the approaches to single nucleotide polymorphism (SNP) prioritisation and discuss the tools available to assess functional relevance of the risk variant with consideration to its genomic location. The strategy and the tools discussed are applicable to any study investigating genetic risk factors associated with a particular disease. Some of the tools are also applicable for the functional validation of variants relevant to the era of GWAS and next generation sequencing (NGS).
Collapse
Affiliation(s)
- Radhika Patnala
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD 4059, Australia
| | | | | |
Collapse
|
24
|
Shihab HA, Gough J, Cooper DN, Day INM, Gaunt TR. Predicting the functional consequences of cancer-associated amino acid substitutions. ACTA ACUST UNITED AC 2013; 29:1504-10. [PMID: 23620363 PMCID: PMC3673218 DOI: 10.1093/bioinformatics/btt182] [Citation(s) in RCA: 171] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation: The number of missense mutations being identified in cancer genomes has greatly increased as a consequence of technological advances and the reduced cost of whole-genome/whole-exome sequencing methods. However, a high proportion of the amino acid substitutions detected in cancer genomes have little or no effect on tumour progression (passenger mutations). Therefore, accurate automated methods capable of discriminating between driver (cancer-promoting) and passenger mutations are becoming increasingly important. In our previous work, we developed the Functional Analysis through Hidden Markov Models (FATHMM) software and, using a model weighted for inherited disease mutations, observed improved performances over alternative computational prediction algorithms. Here, we describe an adaptation of our original algorithm that incorporates a cancer-specific model to potentiate the functional analysis of driver mutations. Results: The performance of our algorithm was evaluated using two separate benchmarks. In our analysis, we observed improved performances when distinguishing between driver mutations and other germ line variants (both disease-causing and putatively neutral mutations). In addition, when discriminating between somatic driver and passenger mutations, we observed performances comparable with the leading computational prediction algorithms: SPF-Cancer and TransFIC. Availability and implementation: A web-based implementation of our cancer-specific model, including a downloadable stand-alone package, is available at http://fathmm.biocompute.org.uk. Contact:fathmm@biocompute.org.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hashem A Shihab
- Bristol Centre for Systems Biomedicine and MRC CAiTE Centre, School of Social and Community Medicine, University of Bristol, Bristol BS8 2BN, UK
| | | | | | | | | |
Collapse
|
25
|
Zhao H, Yang Y, Lin H, Zhang X, Mort M, Cooper DN, Liu Y, Zhou Y. DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol 2013; 14:R23. [PMID: 23497682 PMCID: PMC4053752 DOI: 10.1186/gb-2013-14-3-r23] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 03/13/2013] [Indexed: 02/07/2023] Open
Abstract
Micro-indels (insertions or deletions shorter than 21 bps) constitute the second most frequent class of human gene mutation after single nucleotide variants. Despite the relative abundance of non-frameshifting indels, their damaging effect on protein structure and function has gone largely unstudied. We have developed a support vector machine-based method named DDIG-in (Detecting disease-causing genetic variations due to indels) to prioritize non-frameshifting indels by comparing disease-associated mutations with putatively neutral mutations from the 1,000 Genomes Project. The final model gives good discrimination for indels and is robust against annotation errors. A webserver implementing DDIG-in is available at http://sparks-lab.org/ddig.
Collapse
|
26
|
Abstract
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.
Collapse
Affiliation(s)
- Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America.
| | | | | |
Collapse
|
27
|
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt TR. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 2012; 34:57-65. [PMID: 23033316 PMCID: PMC3558800 DOI: 10.1002/humu.22225] [Citation(s) in RCA: 887] [Impact Index Per Article: 73.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 09/02/2012] [Indexed: 01/30/2023]
Abstract
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.
Collapse
Affiliation(s)
- Hashem A Shihab
- Bristol Centre for Systems Biomedicine and MRC CAiTE Centre, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Nair PS, Vihinen M. VariBench: A Benchmark Database for Variations. Hum Mutat 2012; 34:42-9. [DOI: 10.1002/humu.22204] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 07/31/2012] [Indexed: 12/21/2022]
|
29
|
Disease-associated mutations disrupt functionally important regions of intrinsic protein disorder. PLoS Comput Biol 2012; 8:e1002709. [PMID: 23055912 PMCID: PMC3464192 DOI: 10.1371/journal.pcbi.1002709] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Accepted: 08/14/2012] [Indexed: 01/01/2023] Open
Abstract
The effects of disease mutations on protein structure and function have been extensively investigated, and many predictors of the functional impact of single amino acid substitutions are publicly available. The majority of these predictors are based on protein structure and evolutionary conservation, following the assumption that disease mutations predominantly affect folded and conserved protein regions. However, the prevalence of the intrinsically disordered proteins (IDPs) and regions (IDRs) in the human proteome together with their lack of fixed structure and low sequence conservation raise a question about the impact of disease mutations in IDRs. Here, we investigate annotated missense disease mutations and show that 21.7% of them are located within such intrinsically disordered regions. We further demonstrate that 20% of disease mutations in IDRs cause local disorder-to-order transitions, which represents a 1.7–2.7 fold increase compared to annotated polymorphisms and neutral evolutionary substitutions, respectively. Secondary structure predictions show elevated rates of transition from helices and strands into loops and vice versa in the disease mutations dataset. Disease disorder-to-order mutations also influence predicted molecular recognition features (MoRFs) more often than the control mutations. The repertoire of disorder-to-order transition mutations is limited, with five most frequent mutations (R→W, R→C, E→K, R→H, R→Q) collectively accounting for 44% of all deleterious disorder-to-order transitions. As a proof of concept, we performed accelerated molecular dynamics simulations on a deleterious disorder-to-order transition mutation of tumor protein p63 and, in agreement with our predictions, observed an increased α-helical propensity of the region harboring the mutation. Our findings highlight the importance of mutations in IDRs and refine the traditional structure-centric view of disease mutations. The results of this study offer a new perspective on the role of mutations in disease, with implications for improving predictors of the functional impact of missense mutations. Intrinsically unstructured or disordered proteins have been implicated in the etiology of a wide spectrum of diseases. However, the molecular mechanisms that relate mutations in intrinsically disordered regions (IDRs) to disease pathogenesis have not been investigated. Disordered proteins do not conform to the prevailing view of deleterious mutations which equates function, structure and evolutionary conservation – intrinsically disordered regions are functional, but lack a fixed three-dimensional structure and in general have low sequence conservation. Here we demonstrate that >20% of disease-associated missense mutations affect IDRs and interfere with their functions. We further show that 20% of deleterious mutations in IDRs induce predicted disorder-to-order transitions. Our predictions are supported by accelerated molecular dynamics simulations that show an increase in helical propensity of the region harboring a disease disorder-to-order transition mutation of tumor protein p63. Our results refine the traditional structure-centric view of disease mutations and offer a new perspective on the role of non-synonymous mutations in disease. Our findings have broad implications for improving predictors of the functional impact of missense mutations, and for interpretation of novel variants identified in large genome sequencing projects that aim to provide a better understanding of human genetic variation and its relevance to common diseases.
Collapse
|
30
|
Giudicessi JR, Kapplinger JD, Tester DJ, Alders M, Salisbury BA, Wilde AAM, Ackerman MJ. Phylogenetic and physicochemical analyses enhance the classification of rare nonsynonymous single nucleotide variants in type 1 and 2 long-QT syndrome. ACTA ACUST UNITED AC 2012; 5:519-28. [PMID: 22949429 DOI: 10.1161/circgenetics.112.963785] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND Hundreds of nonsynonymous single nucleotide variants (nsSNVs) have been identified in the 2 most common long-QT syndrome-susceptibility genes (KCNQ1 and KCNH2). Unfortunately, an ≈3% BACKGROUND and KCNH2 nsSNVs amongst healthy individuals complicates the ability to distinguish rare pathogenic mutations from similarly rare yet presumably innocuous variants. METHODS AND RESULTS In this study, 4 tools [(1) conservation across species, (2) Grantham values, (3) sorting intolerant from tolerant, and (4) polymorphism phenotyping] were used to predict pathogenic or benign status for nsSNVs identified across 388 clinically definite long-QT syndrome cases and 1344 ostensibly healthy controls. From these data, estimated predictive values were determined for each tool independently, in concert with previously published protein topology-derived estimated predictive values, and synergistically when ≥3 tools were in agreement. Overall, all 4 tools displayed a statistically significant ability to distinguish between case-derived and control-derived nsSNVs in KCNQ1, whereas each tool, except Grantham values, displayed a similar ability to differentiate KCNH2 nsSNVs. Collectively, when at least 3 of the 4 tools agreed on the pathogenic status of C-terminal nsSNVs located outside the KCNH2/Kv11.1 cyclic nucleotide-binding domain, the topology-specific estimated predictive value improved from 56% to 91%. CONCLUSIONS Although in silico prediction tools should not be used to predict independently the pathogenicity of a novel, rare nSNV, our results support the potential clinical use of the synergistic utility of these tools to enhance the classification of nsSNVs, particularly for Kv11.1's difficult to interpret C-terminal region.
Collapse
Affiliation(s)
- John R Giudicessi
- Department of Medicine/Division of Cardiovascular Diseases, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | | | | | | | | | | | | |
Collapse
|
31
|
Thomas L, Richards M, Mort M, Dunlop E, Cooper DN, Upadhyaya M. Assessment of the potential pathogenicity of missense mutations identified in the GTPase-activating protein (GAP)-related domain of the neurofibromatosis type-1 (NF1) gene. Hum Mutat 2012; 33:1687-96. [PMID: 22807134 DOI: 10.1002/humu.22162] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 06/28/2012] [Indexed: 11/09/2022]
Abstract
Neurofibromatosis type-1 (NF1) is caused by constitutional mutations of the NF1 tumor-suppressor gene. Although ∼85% of inherited NF1 microlesions constitute truncating mutations, the remaining ∼15% are missense mutations whose pathological relevance is often unclear. The GTPase-activating protein-related domain (GRD) of the NF1-encoded protein, neurofibromin, serves to define its major function as a negative regulator of the Ras-MAPK (mitogen-activated protein kinase) signaling pathway. We have established a functional assay to assess the potential pathogenicity of 15 constitutional nonsynonymous NF1 missense mutations (11 novel and 4 previously reported but not functionally characterized) identified in the NF1-GRD (p.R1204G, p.R1204W, p.R1276Q, p.L1301R, p.I1307V, p.T1324N, p.E1327G, p.Q1336R, p.E1356G, p.R1391G, p.V1398D, p.K1409E, p.P1412R, p.K1436Q, p.S1463F). Individual mutations were introduced into an NF1-GRD expression vector and activated Ras was assayed by an enzyme-linked immunosorbent assay (ELISA). Ten NF1-GRD variants were deemed to be potentially pathogenic by virtue of significantly elevated levels of activated GTP-bound Ras in comparison to wild-type NF1 protein. The remaining five NF1-GRD variants were deemed less likely to be of pathological significance as they exhibited similar levels of activated Ras to the wild-type protein. These conclusions received broad support from both bioinformatic analysis and molecular modeling and serve to improve our understanding of NF1-GRD structure and function.
Collapse
Affiliation(s)
- Laura Thomas
- Institute of Medical Genetics, Cardiff University, Cardiff, UK
| | | | | | | | | | | |
Collapse
|
32
|
Vucic EA, Thu KL, Robison K, Rybaczyk LA, Chari R, Alvarez CE, Lam WL. Translating cancer 'omics' to improved outcomes. Genome Res 2012; 22:188-95. [PMID: 22301133 DOI: 10.1101/gr.124354.111] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The genomics era has yielded great advances in the understanding of cancer biology. At the same time, the immense complexity of the cancer genome has been revealed, as well as a striking heterogeneity at the whole-genome (or omics) level that exists between even histologically similar tumors. The vast accrual and public availability of multi-omics databases with associated clinical annotation including tumor histology, patient response, and outcome are a rich resource that has the potential to lead to rapid translation of high-throughput omics to improved overall survival. We focus on the unique advantages of a multidimensional approach to genomic analysis in this new high-throughput omics age and discuss the implications of the changing cancer demographic to translational omics research.
Collapse
Affiliation(s)
- Emily A Vucic
- British Columbia Cancer Research Centre, Vancouver V5Z 1L3, Canada.
| | | | | | | | | | | | | |
Collapse
|
33
|
Rubio JP, Topp S, Warren L, St Jean PL, Wegmann D, Kessner D, Novembre J, Shen J, Fraser D, Aponte J, Nangle K, Cardon LR, Ehm MG, Chissoe SL, Whittaker JC, Nelson MR, Mooser VE. Deep sequencing of the LRRK2 gene in 14,002 individuals reveals evidence of purifying selection and independent origin of the p.Arg1628Pro mutation in Europe. Hum Mutat 2012; 33:1087-98. [PMID: 22415848 DOI: 10.1002/humu.22075] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 02/24/2012] [Indexed: 12/12/2022]
Abstract
Genetic variation in LRRK2 predisposes to Parkinson disease (PD), which underpins its development as a therapeutic target. Here, we aimed to identify novel genotype-phenotype associations that might support developing LRRK2 therapies for other conditions. We sequenced the 51 exons of LRRK2 in cases comprising 12 common diseases (n = 9,582), and in 4,420 population controls. We identified 739 single-nucleotide variants, 62% of which were observed in only one person, including 316 novel exonic variants. We found evidence of purifying selection for the LRRK2 gene and a trend suggesting that this is more pronounced in the central (ROC-COR-kinase) core protein domains of LRRK2 than the flanking domains. Population genetic analyses revealed that LRRK2 is not especially polymorphic or differentiated in comparison to 201 other drug target genes. Among Europeans, we identified 17 carriers (0.13%) of pathogenic LRRK2 mutations that were not significantly enriched within any disease or in those reporting a family history of PD. Analysis of pathogenic mutations within Europe reveals that the p.Arg1628Pro (c4883G>C) mutation arose independently in Europe and Asia. Taken together, these findings demonstrate how targeted deep sequencing can help to reveal fundamental characteristics of clinically important loci.
Collapse
Affiliation(s)
- Justin P Rubio
- Quantitative Sciences, Research and Development, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, England, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Vacic V, Iakoucheva LM. Disease mutations in disordered regions--exception to the rule? MOLECULAR BIOSYSTEMS 2012; 8:27-32. [PMID: 22080206 PMCID: PMC3307532 DOI: 10.1039/c1mb05251a] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Intrinsically disordered proteins (IDPs) have been implicated in a number of human diseases, including cancer, diabetes, neurodegenerative and cardiovascular disorders. Although for some of these conditions molecular mechanisms are now better understood, the big picture connecting distinct structural properties and functional repertoire of IDPs to pathogenesis and disease progression is still incomplete. Recent studies suggest that signaling and regulatory roles carried out by IDPs require them to be tightly regulated, and that altered IDP abundance may lead to disease. Here, we propose another link between IDPs and disease that takes into account disease-associated missense mutations located in the intrinsically disordered regions. We argue that such mutations are more prevalent and have larger functional impact than previously thought. In addition, we demonstrate that deleterious amino acid substitutions that cause disorder-to-order transitions are particularly enriched among disease mutations compared to neutral polymorphisms. Finally, we discuss potential differences in functional outcomes between disease mutations in ordered and disordered regions, and challenge the conventional structure-centric view of missense mutations.
Collapse
Affiliation(s)
- Vladimir Vacic
- Department of Computer Science, Columbia University, New York, NY 10027
| | - Lilia M. Iakoucheva
- Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093
| |
Collapse
|
35
|
Thomas L, Spurlock G, Eudall C, Thomas NS, Mort M, Hamby SE, Chuzhanova N, Brems H, Legius E, Cooper DN, Upadhyaya M. Exploring the somatic NF1 mutational spectrum associated with NF1 cutaneous neurofibromas. Eur J Hum Genet 2011; 20:411-9. [PMID: 22108604 DOI: 10.1038/ejhg.2011.207] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Neurofibromatosis type-1 (NF1), caused by heterozygous inactivation of the NF1 tumour suppressor gene, is associated with the development of benign and malignant peripheral nerve sheath tumours (MPNSTs). Although numerous germline NF1 mutations have been identified, relatively few somatic NF1 mutations have been described in neurofibromas. Here we have screened 109 cutaneous neurofibromas, excised from 46 unrelated NF1 patients, for somatic NF1 mutations. NF1 mutation screening (involving loss-of-heterozygosity (LOH) analysis, multiplex ligation-dependent probe amplification and DNA sequencing) identified 77 somatic NF1 point mutations, of which 53 were novel. LOH spanning the NF1 gene region was evident in 25 neurofibromas, but in contrast to previous data from MPNSTs, it was absent at the TP53, CDKN2A and RB1 gene loci. Analysis of DNA/RNA from neurofibroma-derived Schwann cell cultures revealed NF1 mutations in four tumours whose presence had been overlooked in the tumour DNA. Bioinformatics analysis suggested that four of seven novel somatic NF1 missense mutations (p.A330T, p.Q519P, p.A776T, p.S1463F) could be of functional/clinical significance. Functional analysis confirmed this prediction for p.S1463F, located within the GTPase-activating protein-related domain, as this mutation resulted in a 150-fold increase in activated GTP-bound Ras. Comparison of the relative frequencies of the different types of somatic NF1 mutation observed with those of their previously reported germline counterparts revealed significant (P=0.001) differences. Although non-identical somatic mutations involving either the same or adjacent nucleotides were identified in three pairs of tumours from the same patients (P<0.0002), no association was noted between the type of germline and somatic NF1 lesion within the same individual.
Collapse
Affiliation(s)
- Laura Thomas
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park Way, Cardiff, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B. The National Center for Biomedical Ontology. J Am Med Inform Assoc 2011; 19:190-5. [PMID: 22081220 DOI: 10.1136/amiajnl-2011-000523] [Citation(s) in RCA: 142] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.
Collapse
Affiliation(s)
- Mark A Musen
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305-5479, USA.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet 2011; 27:377-86. [PMID: 21764165 PMCID: PMC3272884 DOI: 10.1016/j.tig.2011.06.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Revised: 06/10/2011] [Accepted: 06/13/2011] [Indexed: 12/30/2022]
Abstract
Modern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individual's predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available. Fortunately, the process of natural selection has lent us an almost infinite set of tests in nature. During long-term evolution, new mutations and existing variations have been evaluated for their biological consequences in countless species, and outcomes are readily revealed by multispecies genome comparisons. We review studies that have investigated evolutionary characteristics and in silico functional diagnoses of nSNVs found in thousands of disease-associated genes. We conclude that the patterns of long-term evolutionary conservation and permissible sequence divergence are essential and instructive modalities for functional assessment of human genetic variations.
Collapse
Affiliation(s)
- Sudhir Kumar
- School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA.
| | | | | | | |
Collapse
|
38
|
Torkamani A, Scott-Van Zeeland AA, Topol EJ, Schork NJ. Annotating individual human genomes. Genomics 2011; 98:233-41. [PMID: 21839162 DOI: 10.1016/j.ygeno.2011.07.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/26/2011] [Indexed: 02/03/2023]
Abstract
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Collapse
|
39
|
Grossi S, Regis S, Biancheri R, Mort M, Lualdi S, Bertini E, Uziel G, Boespflug-Tanguy O, Simonati A, Corsolini F, Demir E, Marchiani V, Percesepe A, Stanzial F, Rossi A, Vaurs-Barrière C, Cooper DN, Filocamo M. Molecular genetic analysis of the PLP1 gene in 38 families with PLP1-related disorders: identification and functional characterization of 11 novel PLP1 mutations. Orphanet J Rare Dis 2011; 6:40. [PMID: 21679407 PMCID: PMC3125326 DOI: 10.1186/1750-1172-6-40] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Accepted: 06/16/2011] [Indexed: 12/18/2022] Open
Abstract
Background The breadth of the clinical spectrum underlying Pelizaeus-Merzbacher disease and spastic paraplegia type 2 is due to the extensive allelic heterogeneity in the X-linked PLP1 gene encoding myelin proteolipid protein (PLP). PLP1 mutations range from gene duplications of variable size found in 60-70% of patients to intragenic lesions present in 15-20% of patients. Methods Forty-eight male patients from 38 unrelated families with a PLP1-related disorder were studied. All DNA samples were screened for PLP1 gene duplications using real-time PCR. PLP1 gene sequencing analysis was performed on patients negative for the duplication. The mutational status of all 14 potential carrier mothers of the familial PLP1 gene mutation was determined as well as 15/24 potential carrier mothers of the PLP1 duplication. Results and Conclusions PLP1 gene duplications were identified in 24 of the unrelated patients whereas a variety of intragenic PLP1 mutations were found in the remaining 14 patients. Of the 14 different intragenic lesions, 11 were novel; these included one nonsense and 7 missense mutations, a 657-bp deletion, a microdeletion and a microduplication. The functional significance of the novel PLP1 missense mutations, all occurring at evolutionarily conserved residues, was analysed by the MutPred tool whereas their potential effect on splicing was ascertained using the Skippy algorithm and a neural network. Although MutPred predicted that all 7 novel missense mutations would be likely to be deleterious, in silico analysis indicated that four of them (p.Leu146Val, p.Leu159Pro, p.Thr230Ile, p.Ala247Asp) might cause exon skipping by altering exonic splicing elements. These predictions were then investigated in vitro for both p.Leu146Val and p.Thr230Ile by means of RNA or minigene studies and were subsequently confirmed in the case of p.Leu146Val. Peripheral neuropathy was noted in four patients harbouring intragenic mutations that altered RNA processing, but was absent from all PLP1-duplication patients. Unprecedentedly, family studies revealed the de novo occurrence of the PLP1 duplication at a frequency of 20%.
Collapse
Affiliation(s)
- Serena Grossi
- SSD Lab, Diagnosi Pre-Postnatale Malattie Metaboliche, IRCCS G, Gaslini, Genova, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Enabling enrichment analysis with the Human Disease Ontology. J Biomed Inform 2011; 44 Suppl 1:S31-S38. [PMID: 21550421 DOI: 10.1016/j.jbi.2011.04.007] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2011] [Revised: 04/12/2011] [Accepted: 04/22/2011] [Indexed: 01/30/2023]
Abstract
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene set, and is widely used to make sense of the results of high-throughput experiments. Our goal is to develop and apply general enrichment analysis methods to profile other sets of interest, such as patient cohorts from the electronic medical record, using a variety of ontologies including SNOMED CT, MedDRA, RxNorm, and others. Although it is possible to perform enrichment analysis using ontologies other than the GO, a key pre-requisite is the availability of a background set of annotations to enable the enrichment calculation. In the case of the GO, this background set is provided by the Gene Ontology Annotations. In the current work, we describe: (i) a general method that uses hand-curated GO annotations as a starting point for creating background datasets for enrichment analysis using other ontologies; and (ii) a gene-disease background annotation set - that enables disease-based enrichment - to demonstrate feasibility of our method.
Collapse
|
41
|
Abstract
A key goal in cancer research is to find the genomic alterations that underlie malignant cells. Genomics has proved successful in identifying somatic variants at a large scale. However, it has become evident that a typical cancer exhibits a heterogenous mutation pattern across samples. Cases where the same alteration is observed repeatedly seem to be the exception rather than the norm. Thus, pinpointing the key alterations (driver mutations) from a background of variations with no direct causal link to cancer (passenger mutations) is difficult. Here we analyze somatic missense mutations from cancer samples and their healthy tissue counterparts (germline mutations) from the viewpoint of germline fitness. We calibrate a scoring system from protein domain alignments to score mutations and their target loci. We show first that this score predicts to a good degree the rate of polymorphism of the observed germline variation. The scoring is then applied to somatic mutations. We show that candidate cancer genes prone to copy number loss harbor mutations with germline fitness effects that are significantly more deleterious than expected by chance. This suggests that missense mutations play a driving role in tumor suppressor genes. Furthermore, these mutations fall preferably onto loci in sequence neighborhoods that are high scoring in terms of germline fitness. In contrast, for somatic mutations in candidate onco genes we do not observe a statistically significant effect. These results help to inform how to exploit germline fitness predictions in discovering new genes and mutations responsible for cancer.
Collapse
|
42
|
Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. Testing for an unusual distribution of rare variants. PLoS Genet 2011; 7:e1001322. [PMID: 21408211 PMCID: PMC3048375 DOI: 10.1371/journal.pgen.1001322] [Citation(s) in RCA: 455] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 01/31/2011] [Indexed: 01/31/2023] Open
Abstract
Technological advances make it possible to use high-throughput sequencing as a primary discovery tool of medical genetics, specifically for assaying rare variation. Still this approach faces the analytic challenge that the influence of very rare variants can only be evaluated effectively as a group. A further complication is that any given rare variant could have no effect, could increase risk, or could be protective. We propose here the C-alpha test statistic as a novel approach for testing for the presence of this mixture of effects across a set of rare variants. Unlike existing burden tests, C-alpha, by testing the variance rather than the mean, maintains consistent power when the target set contains both risk and protective variants. Through simulations and analysis of case/control data, we demonstrate good power relative to existing methods that assess the burden of rare variants in individuals. Developments in sequencing technology now enable us to assay all genetic variation, much of which is extremely rare. We propose to test the distribution of rare variants we observe in cases versus controls. To do so, we present a novel application of the C-alpha statistic to test these rare variants. C-alpha aims to determine whether the set of variants observed in cases and controls is a mixture, such that some of the variants confer risk or protection or are phenotypically neutral. Risk variants are expected to be more common in cases; protective variants more common in controls. C-alpha is sensitive to this imbalance, regardless of its origin—risk, protective, or both—but is ideally suited for a mixture of protective and risk variants. Variation in APOB nicely illustrates a mixture, in that certain rare variants increase triglyceride levels while others decrease it. The hallmark feature of C-alpha is that it uses the distribution of variation observed in cases and controls to detect the presence of a mixture, thus implicating genes or pathways as risk factors for disease.
Collapse
Affiliation(s)
- Benjamin M. Neale
- The Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- * E-mail: (BMN); (MJD); (KR)
| | - Manuel A. Rivas
- The Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Benjamin F. Voight
- The Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - David Altshuler
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Marju Orho-Melander
- Department of Clinical Sciences Malmö, Diabetes and Cardiovascular Diseases, Genetic Epidemiology CRC, University Hospital Malmö, Malmö, Sweden
| | - Sekar Kathiresan
- The Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Shaun M. Purcell
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Kathryn Roeder
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail: (BMN); (MJD); (KR)
| | - Mark J. Daly
- The Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- * E-mail: (BMN); (MJD); (KR)
| |
Collapse
|
43
|
Tappino B, Biancheri R, Mort M, Regis S, Corsolini F, Rossi A, Stroppiano M, Lualdi S, Fiumara A, Bembi B, Di Rocco M, Cooper DN, Filocamo M. Identification and characterization of 15 novel GALC gene mutations causing Krabbe disease. Hum Mutat 2011; 31:E1894-914. [PMID: 20886637 PMCID: PMC3052420 DOI: 10.1002/humu.21367] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The characterization of the underlying GALC gene lesions was performed in 30 unrelated patients affected by Krabbe disease, an autosomal recessive leukodystrophy caused by the deficiency of lysosomal enzyme galactocerebrosidase. The GALC mutational spectrum comprised 33 distinct mutant (including 15 previously unreported) alleles. With the exception of 4 novel missense mutations that replaced evolutionarily highly conserved residues (p.P318R, p.G323R, p.I384T, p.Y490N), most of the newly described lesions altered mRNA processing. These included 7 frameshift mutations (c.61delG, c.408delA, c.521delA, c.1171_1175delCATTCinsA, c.1405_1407delCTCinsT, c.302_308dupAAATAGG, c.1819_1826dupGTTACAGG), 3 nonsense mutations (p.R69X, p.K88X, p.R127X) one of which (p.K88X) mediated the skipping of exon 2, and a splicing mutation (c.1489+1G>A) which induced the partial skipping of exon 13. In addition, 6 previously unreported GALC polymorphisms were identified. The functional significance of the novel GALC missense mutations and polymorphisms was investigated using the MutPred analysis tool. This study, reporting one of the largest genotype-phenotype analyses of the GALC gene so far performed in a European Krabbe disease cohort, revealed that the Italian GALC mutational profile differs significantly from other populations of European origin. This is due in part to a GALC missense substitution (p.G553R) that occurs at high frequency on a common founder haplotype background in patients originating from the Naples region. © 2010 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Barbara Tappino
- S.S.D. Lab. Diagnosi Pre-Postnatale Malattie Metaboliche, IRCCS G. Gaslini, Genova, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Thusberg J, Olatubosun A, Vihinen M. Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 2011; 32:358-68. [PMID: 21412949 DOI: 10.1002/humu.21445] [Citation(s) in RCA: 384] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Accepted: 12/07/2010] [Indexed: 11/10/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in humans. The number of SNPs identified in the human genome is growing rapidly, but attaining experimental knowledge about the possible disease association of variants is laborious and time-consuming. Several computational methods have been developed for the classification of SNPs according to their predicted pathogenicity. In this study, we have evaluated the performance of nine widely used pathogenicity prediction methods available on the Internet. The evaluated methods were MutPred, nsSNPAnalyzer, Panther, PhD-SNP, PolyPhen, PolyPhen2, SIFT, SNAP, and SNPs&GO. The methods were tested with a set of over 40,000 pathogenic and neutral variants. We also assessed whether the type of original or substituting amino acid residue, the structural class of the protein, or the structural environment of the amino acid substitution, had an effect on the prediction performance. The performances of the programs ranged from poor (MCC 0.19) to reasonably good (MCC 0.65), and the results from the programs correlated poorly. The overall best performing methods in this study were SNPs&GO and MutPred, with accuracies reaching 0.82 and 0.81, respectively.
Collapse
Affiliation(s)
- Janita Thusberg
- Institute of Biomedical Technology, F1-33014 University of Tampere, Finland
| | | | | |
Collapse
|
45
|
Reilich P, Krause S, Schramm N, Klutzny U, Bulst S, Zehetmayer B, Schneiderat P, Walter MC, Schoser B, Lochmüller H. A novel mutation in the myotilin gene (MYOT) causes a severe form of limb girdle muscular dystrophy 1A (LGMD1A). J Neurol 2011; 258:1437-44. [PMID: 21336781 DOI: 10.1007/s00415-011-5953-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 02/07/2011] [Indexed: 01/04/2023]
Abstract
Here we describe a patient with limb girdle muscular dystrophy 1A (LGMD1A) due to a novel myotilin gene (MYOT) mutation with late onset, rapid progression, loss of ambulation and respiratory failure. The onset of weakness in proximal muscles and muscle MRI findings are clearly different from the pattern identified in myofibrillar myopathies (MFM) related to MYOT mutations. Moreover, there was very limited evidence of myofibrillar pathology in several muscle biopsies obtained during the disease course. We conclude, that MYOT mutations need to be considered as a rare cause of adult-onset, dominant LGMD without clear-cut MFM pathology.
Collapse
Affiliation(s)
- Peter Reilich
- Friedrich-Baur-Institut, Department of Neurology, Ludwig-Maximilians-University, Munich, Germany.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Li Y, Wen Z, Xiao J, Yin H, Yu L, Yang L, Li M. Predicting disease-associated substitution of a single amino acid by analyzing residue interactions. BMC Bioinformatics 2011; 12:14. [PMID: 21223604 PMCID: PMC3027113 DOI: 10.1186/1471-2105-12-14] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 01/12/2011] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues. RESULTS We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively. CONCLUSIONS The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.
Collapse
Affiliation(s)
- Yizhou Li
- Key Laboratory of Green Chemistry and Technology, Ministry of Education, College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | |
Collapse
|
47
|
Tirrell R, Evani U, Berman AE, Mooney SD, Musen MA, Shah NH. An ontology-neutral framework for enrichment analysis. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:797-801. [PMID: 21347088 PMCID: PMC3041299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Advanced statistical methods used to analyze high-throughput data (e.g. gene-expression assays) result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is relevant for and extensible to data analysis with other high-throughput measurement modalities such as proteomics, metabolomics, and tissue-microarray assays. With the availability of tools for automatic ontology-based annotation of datasets with terms from biomedical ontologies besides the GO, we need not restrict enrichment analysis to the GO. We describe, RANSUM - Rich Annotation Summarizer - which performs enrichment analysis using any ontology in the National Center for Biomedical Ontology's (NCBO) BioPortal. We outline the methodology of enrichment analysis, the associated challenges, and discuss novel analyses enabled by RANSUM.
Collapse
Affiliation(s)
- Rob Tirrell
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | | | | | | | | | | |
Collapse
|
48
|
Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010; 31:631-55. [PMID: 20506564 DOI: 10.1002/humu.21260] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of reported germline mutations in human nuclear genes, either underlying or associated with inherited disease, has now exceeded 100,000 in more than 3,700 different genes. The availability of these data has both revolutionized the study of the morbid anatomy of the human genome and facilitated "personalized genomics." With approximately 300 new "inherited disease genes" (and approximately 10,000 new mutations) being identified annually, it is pertinent to ask how many "inherited disease genes" there are in the human genome, how many mutations reside within them, and where such lesions are likely to be located? To address these questions, it is necessary not only to reconsider how we define human genes but also to explore notions of gene "essentiality" and "dispensability."Answers to these questions are now emerging from recent novel insights into genome structure and function and through complete genome sequence information derived from multiple individual human genomes. However, a change in focus toward screening functional genomic elements as opposed to genes sensu stricto will be required if we are to capitalize fully on recent technical and conceptual advances and identify new types of disease-associated mutation within noncoding regions remote from the genes whose function they disrupt.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Xin F, Myers S, Li YF, Cooper DN, Mooney SD, Radivojac P. Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease. ACTA ACUST UNITED AC 2010; 26:1975-82. [PMID: 20551136 DOI: 10.1093/bioinformatics/btq319] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
MOTIVATION Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. RESULTS We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. AVAILABILITY AND IMPLEMENTATION Source code for the structural kernel is available at www.informatics.indiana.edu/predrag/.
Collapse
Affiliation(s)
- Fuxiao Xin
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | | | | | | | |
Collapse
|
50
|
Li S, Iakoucheva LM, Mooney SD, Radivojac P. Loss of post-translational modification sites in disease. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2009:337-47. [PMID: 19908386 DOI: 10.1142/9789814295291_0036] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Understanding and predicting molecular cause of disease is one of the major challenges for biology and medicine. One particular area of interest continues to be computational analyses of disease-associated amino acid substitutions. To this end, various studies have been performed to identify molecular functions disrupted by disease-causing mutations. Here, we investigate the influence of disease-associated mutations on post-translational modifications. In particular, we study the loss of modification target sites as a consequence of disease mutation. We find that about 5% of disease-associated mutations may affect known modification sites, either partially (4%) of fully (1%), compared to about 2% of putatively neutral polymorphisms. Most of the fifteen post-translational modification types analyzed were found to be disrupted at levels higher than expected by chance. Molecular functions and physiochemical properties at sites of disease mutation were also compared to those of neutral polymorphisms involved in the process of post-translational modification site disruption. Disease-associated mutations in the neighborhood of post-translationally modified sites were found to be enriched in mutations that change polarity, charge, and hydrophobicity of the wild-type amino acids. Overall, these results further suggest that disruption of modification sites is an important but not the major cause of human genetic disease.
Collapse
Affiliation(s)
- Shuyan Li
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | | | |
Collapse
|