1
|
Campitelli P, Kazan IC, Hamilton S, Ozkan SB. Dynamic Allostery: Evolution's Double-Edged Sword in Protein Function and Disease. J Mol Biol 2025:169175. [PMID: 40286867 DOI: 10.1016/j.jmb.2025.169175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Revised: 04/21/2025] [Accepted: 04/21/2025] [Indexed: 04/29/2025]
Abstract
Allostery is a core mechanism in biology that allows proteins to communicate and regulate activity over long structural distances. While classical models of allostery focus on conformational changes triggered by ligand binding, dynamic allostery-where protein function is modulated through alterations in thermal fluctuations without major conformational shifts-has emerged as a critical evolutionary mechanism. This review explores how evolution leverages dynamic allostery to fine-tune protein function through subtle mutations at distal sites, preserving core structural architecture while dramatically altering functional properties. Using a combination of computational approaches including Dynamic Flexibility Index (DFI), Dynamic Coupling Index (DCI), and vibrational density of states (VDOS) analysis, we demonstrate that functional adaptations in proteins often involve "hinge-shift" mechanisms, where redistribution of rigid and flexible regions modulates collective motions without changing the overall fold. This evolutionary principle is a double-edged sword: the same mechanisms that enable functional innovation also create vulnerabilities that can be exploited in disease states. Disease-associated variants frequently occur at positions highly coupled to functional sites despite being physically distant, forming Dynamic Allosteric Residue Couples (DARC sites). We demonstrate applications of these principles in understanding viral evolution, drug resistance, and capsid assembly dynamics. Understanding dynamic allostery provides critical insights into protein evolution and offers new avenues for therapeutic interventions targeting allosteric regulation.
Collapse
Affiliation(s)
- Paul Campitelli
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States
| | - I Can Kazan
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States
| | - Sean Hamilton
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States
| | - S Banu Ozkan
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States.
| |
Collapse
|
2
|
Johnson R, Li MM, Noori A, Queen O, Zitnik M. Graph Artificial Intelligence in Medicine. Annu Rev Biomed Data Sci 2024; 7:345-368. [PMID: 38749465 PMCID: PMC11344018 DOI: 10.1146/annurev-biodatasci-110723-024625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.
Collapse
Affiliation(s)
- Ruth Johnson
- Berkowitz Family Living Laboratory, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Ayush Noori
- Department of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences, Allston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Owen Queen
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Marinka Zitnik
- Harvard Data Science Initiative, Cambridge, Massachusetts, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| |
Collapse
|
3
|
Rollo C, Pancotti C, Birolo G, Rossi I, Sanavia T, Fariselli P. Influence of Model Structures on Predictors of Protein Stability Changes from Single-Point Mutations. Genes (Basel) 2023; 14:2228. [PMID: 38137050 PMCID: PMC10742815 DOI: 10.3390/genes14122228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.
Collapse
Affiliation(s)
- Cesare Rollo
- Department of Medical Sciences, University Torino, 10126 Torino, Italy (G.B.); (I.R.); (T.S.); (P.F.)
| | | | | | | | | | | |
Collapse
|
4
|
Gharemirshamloo FR, Majumder R, Kumar S U, Doss C GP, Bamdad K, Frootan F, Un C. Effects of the pathological E200K mutation on human prion protein: A computational screening and molecular dynamics approach. J Cell Biochem 2023; 124:254-265. [PMID: 36565210 DOI: 10.1002/jcb.30359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/06/2022] [Accepted: 12/08/2022] [Indexed: 12/25/2022]
Abstract
The human prion protein gene (PRNP) is mapped to the short arm of chromosome 20 (20pter-12). Prion disease is associated with mutations in the prion protein-encoding gene sequence. Earlier studies found that the mutation G127V in the PRNP increases protein stability. In contrast, the mutation E200K, which has the highest mutation rate in the prion protein, causes Creutzfeldt-Jakob disease (CJD) in humans and induces protein aggregation. We aimed to identify the structural mechanisms of E200k and G127V mutations causing CJD. We used a variety of bioinformatic algorithms, including SIFT, PolyPhen, I-Mutant, PhD-SNP, and SNP& GO, to predict the association of the E200K mutation with prion disease. MD simulation is performed, and graphs for root mean square deviation, root mean square fluctuation, radius of gyration, DSSP, principal component analysis, porcupine, and free energy landscape are generated to confirm and prove the stability of the wild-type and mutant protein structures. The protein is analyzed for aggregation, and the results indicate more fluctuations in the protein structure during the simulation owing to the E200K mutation; however, the G127V mutation makes the protein structure stable against aggregation during the simulation.
Collapse
Affiliation(s)
| | - Ranabir Majumder
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Udhaya Kumar S
- Department of Integrative Biology, Laboratory of Integrative Genomics, School of Bio Sciences and Technology, Vellore Institute of Technology (VIT), Vellore, Tamil Nadu, India
| | - George Priya Doss C
- Department of Integrative Biology, Laboratory of Integrative Genomics, School of Bio Sciences and Technology, Vellore Institute of Technology (VIT), Vellore, Tamil Nadu, India
| | - Kourosh Bamdad
- Department of Biology, Payame Noor University, Tehran, Iran
| | - Fateme Frootan
- Institute of Agricultural Biotechnology, National Institute of Genetic Engineering & Biotechnology (NIGEB), Tehran, Iran
| | - Cemal Un
- Department of Biology, Division of Molecular Biology, Ege University, Izmir, Turkey
| |
Collapse
|
5
|
Benevenuta S, Birolo G, Sanavia T, Capriotti E, Fariselli P. Challenges in predicting stabilizing variations: An exploration. Front Mol Biosci 2023; 9:1075570. [PMID: 36685278 PMCID: PMC9849384 DOI: 10.3389/fmolb.2022.1075570] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 12/15/2022] [Indexed: 01/06/2023] Open
Abstract
An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) between the mutated structure of the protein and its wild-type form. Throughout the years, bioinformaticians have developed a wide variety of tools and approaches to predict the ΔΔG. Although the performance of these tools is highly variable, overall they are less accurate in predicting ΔΔG stabilizing variations rather than the destabilizing ones. Here, we analyze the possible reasons for this difference by focusing on the relationship between experimentally-measured ΔΔG and seven protein properties on three widely-used datasets (S2648, VariBench, Ssym) and a recently introduced one (S669). These properties include protein structural information, different physical properties and statistical potentials. We found that two highly used input features, i.e., hydrophobicity and the Blosum62 substitution matrix, show a performance close to random choice when trying to separate stabilizing variants from either neutral or destabilizing ones. We then speculate that, since destabilizing variations are the most abundant class in the available datasets, the overall performance of the methods is higher when including features that improve the prediction for the destabilizing variants at the expense of the stabilizing ones. These findings highlight the need of designing predictive methods able to exploit also input features highly correlated with the stabilizing variants. New tools should also be tested on a not-artificially balanced dataset, reporting the performance on all the three classes (i.e., stabilizing, neutral and destabilizing variants) and not only the overall results.
Collapse
Affiliation(s)
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy,*Correspondence: Piero Fariselli,
| |
Collapse
|
6
|
Petrizzelli F, Biagini T, Bianco SD, Liorni N, Napoli A, Castellana S, Mazza T. Connecting the dots: A practical evaluation of web-tools for describing protein dynamics as networks. FRONTIERS IN BIOINFORMATICS 2022; 2:1045368. [PMID: 36438625 PMCID: PMC9689706 DOI: 10.3389/fbinf.2022.1045368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 10/05/2022] [Indexed: 01/25/2023] Open
Abstract
Protein Structure Networks (PSNs) are a well-known mathematical model for estimation and analysis of the three-dimensional protein structure. Investigating the topological architecture of PSNs may help identify the crucial amino acid residues for protein stability and protein-protein interactions, as well as deduce any possible mutational effects. But because proteins go through conformational changes to give rise to essential biological functions, this has to be done dynamically over time. The most effective method to describe protein dynamics is molecular dynamics simulation, with the most popular software programs for manipulating simulations to infer interaction networks being RING, MD-TASK, and NAPS. Here, we compare the computational approaches used by these three tools-all of which are accessible as web servers-to understand the pathogenicity of missense mutations and talk about their potential applications as well as their advantages and disadvantages.
Collapse
Affiliation(s)
- Francesco Petrizzelli
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | - Tommaso Biagini
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | - Salvatore Daniele Bianco
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy,Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Niccolò Liorni
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy,Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Alessandro Napoli
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | - Stefano Castellana
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy,*Correspondence: Tommaso Mazza,
| |
Collapse
|
7
|
Clementel D, Del Conte A, Monzon AM, Camagni GF, Minervini G, Piovesan D, Tosatto SCE. RING 3.0: fast generation of probabilistic residue interaction networks from structural ensembles. Nucleic Acids Res 2022; 50:W651-W656. [PMID: 35554554 PMCID: PMC9252747 DOI: 10.1093/nar/gkac365] [Citation(s) in RCA: 92] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/15/2022] [Accepted: 04/30/2022] [Indexed: 12/18/2022] Open
Abstract
Residue interaction networks (RINs) are used to represent residue contacts in protein structures. Thanks to the advances in network theory, RINs have been proved effective as an alternative to coordinate data in the analysis of complex systems. The RING server calculates high quality and reliable non-covalent molecular interactions based on geometrical parameters. Here, we present the new RING 3.0 version extending the previous functionality in several ways. The underlying software library has been re-engineered to improve speed by an order of magnitude. RING now also supports the mmCIF format and provides typed interactions for the entire PDB chemical component dictionary, including nucleic acids. Moreover, RING now employs probabilistic graphs, where multiple conformations (e.g. NMR or molecular dynamics ensembles) are mapped as weighted edges, opening up new ways to analyze structural data. The web interface has been expanded to include a simultaneous view of the RIN alongside a structure viewer, with both synchronized and clickable. Contact evolution across models (or time) is displayed as a heatmap and can help in the discovery of correlating interaction patterns. The web server, together with an extensive help and tutorial, is available from URL: https://ring.biocomputingup.it/.
Collapse
Affiliation(s)
- Damiano Clementel
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | | | - Giorgia F Camagni
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Giovanni Minervini
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| |
Collapse
|
8
|
Ose NJ, Butler BM, Kumar A, Kazan IC, Sanderford M, Kumar S, Ozkan SB. Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants. PLoS Comput Biol 2022; 18:e1010006. [PMID: 35389981 PMCID: PMC9017885 DOI: 10.1371/journal.pcbi.1010006] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 04/19/2022] [Accepted: 03/09/2022] [Indexed: 01/07/2023] Open
Abstract
Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of β-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease.
Collapse
Affiliation(s)
- Nicholas J. Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
9
|
Pancotti C, Benevenuta S, Birolo G, Alberini V, Repetto V, Sanavia T, Capriotti E, Fariselli P. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief Bioinform 2022; 23:6502552. [PMID: 35021190 PMCID: PMC8921618 DOI: 10.1093/bib/bbab555] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/29/2021] [Accepted: 12/05/2021] [Indexed: 12/13/2022] Open
Abstract
Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and ‘all’ available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21–0.5 and 0–0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51–0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\Delta \Delta G$\end{document} predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.
Collapse
Affiliation(s)
- Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Virginia Alberini
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Valeria Repetto
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
10
|
Lai J, Yang J, Gamsiz Uzun ED, Rubenstein BM, Sarkar IN. LYRUS: a machine learning model for predicting the pathogenicity of missense variants. BIOINFORMATICS ADVANCES 2021; 2:vbab045. [PMID: 35036922 PMCID: PMC8754197 DOI: 10.1093/bioadv/vbab045] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 12/08/2021] [Accepted: 12/21/2021] [Indexed: 01/27/2023]
Abstract
SUMMARY Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS's performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Jiaying Lai
- Center for Biomedical Informatics, Brown University, Providence, RI 02903, USA,Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Jordan Yang
- Department of Chemistry, Brown University, Providence, RI 02906, USA
| | - Ece D Gamsiz Uzun
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA,Department of Pathology and Laboratory Medicine, Brown University Alpert Medical School, Providence, RI 02903, USA,Department of Pathology, Rhode Island Hospital, Providence, RI 02903, USA
| | - Brenda M Rubenstein
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA,Department of Chemistry, Brown University, Providence, RI 02906, USA,To whom correspondence should be addressed. and
| | - Indra Neil Sarkar
- Center for Biomedical Informatics, Brown University, Providence, RI 02903, USA,Rhode Island Quality Institute, Providence, RI 02908, USA,To whom correspondence should be addressed. and
| |
Collapse
|
11
|
A Deep-Learning Sequence-Based Method to Predict Protein Stability Changes Upon Genetic Variations. Genes (Basel) 2021; 12:genes12060911. [PMID: 34204764 PMCID: PMC8231498 DOI: 10.3390/genes12060911] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 06/08/2021] [Accepted: 06/09/2021] [Indexed: 01/17/2023] Open
Abstract
Several studies have linked disruptions of protein stability and its normal functions to disease. Therefore, during the last few decades, many tools have been developed to predict the free energy changes upon protein residue variations. Most of these methods require both sequence and structure information to obtain reliable predictions. However, the lower number of protein structures available with respect to their sequences, due to experimental issues, drastically limits the application of these tools. In addition, current methodologies ignore the antisymmetric property characterizing the thermodynamics of the protein stability: a variation from wild-type to a mutated form of the protein structure (XW→XM) and its reverse process (XM→XW) must have opposite values of the free energy difference (ΔΔGWM=−ΔΔGMW). Here we propose ACDC-NN-Seq, a deep neural network system that exploits the sequence information and is able to incorporate into its architecture the antisymmetry property. To our knowledge, this is the first convolutional neural network to predict protein stability changes relying solely on the protein sequence. We show that ACDC-NN-Seq compares favorably with the existing sequence-based methods.
Collapse
|
12
|
Yazar M, Özbek P. In Silico Tools and Approaches for the Prediction of Functional and Structural Effects of Single-Nucleotide Polymorphisms on Proteins: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 25:23-37. [PMID: 33058752 DOI: 10.1089/omi.2020.0141] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Single-nucleotide polymorphisms (SNPs) are single-base variants that contribute to human biological variation and pathogenesis of many human diseases. Among all SNP types, nonsynonymous single-nucleotide polymorphisms (nsSNPs) can alter many structural, biochemical, and functional features of a protein such as folding characteristics, charge distribution, stability, dynamics, and interactions with other proteins/nucleotides. These modifications in the protein structure can lead nsSNPs to be closely associated with many multifactorial diseases such as cancer, diabetes, and neurodegenerative diseases. Predicting structural and functional effects of nsSNPs with experimental approaches can be time-consuming and costly; hence, computational prediction tools and algorithms are being widely and increasingly utilized in biology and medical research. This expert review examines the in silico tools and algorithms for the prediction of functional or structural effects of SNP variants, in addition to the description of the phenotypic effects of nsSNPs on protein structure, association between pathogenicity of variants, and functional or structural features of disease-associated variants. Finally, case studies investigating the functional and structural effects of nsSNPs on selected protein structures are highlighted. We conclude that creating a consistent workflow with a combination of in silico approaches or tools should be considered to increase the performance, accuracy, and precision of the biological and clinical predictions made in silico.
Collapse
Affiliation(s)
- Metin Yazar
- Department of Bioengineering, Marmara University, Göztepe, İstanbul, Turkey.,Department of Genetics and Bioengineering, Istanbul Okan University, Tuzla, Istanbul, Turkey
| | - Pemra Özbek
- Department of Bioengineering, Marmara University, Göztepe, İstanbul, Turkey
| |
Collapse
|
13
|
Munir A, Vedithi SC, Chaplin AK, Blundell TL. Genomics, Computational Biology and Drug Discovery for Mycobacterial Infections: Fighting the Emergence of Resistance. Front Genet 2020; 11:965. [PMID: 33101362 PMCID: PMC7498718 DOI: 10.3389/fgene.2020.00965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Accepted: 07/31/2020] [Indexed: 12/14/2022] Open
Abstract
Tuberculosis (TB) and leprosy are mycobacterial infections caused by Mycobacterium tuberculosis and Mycobacterium leprae respectively. These diseases continue to be endemic in developing countries where the cost of new medicines presents major challenges. The situation is further exacerbated by the emergence of resistance to many front-line antibiotics. A priority now is to design new antimycobacterials that are not only effective in combatting the diseases but are also less likely to give rise to resistance. In both these respects understanding the structure of drug targets in M. tuberculosis and M. leprae is crucial. In this review we describe structure-guided approaches to understanding the impacts of mutations that give rise to antimycobacterial resistance and the use of this information in the design of new medicines.
Collapse
Affiliation(s)
- Asma Munir
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | | | - Amanda K Chaplin
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
14
|
Sanghera DK, Hopkins R, Malone-Perez MW, Bejar C, Tan C, Mussa H, Whitby P, Fowler B, Rao CV, Fung KA, Lightfoot S, Frazer JK. Targeted sequencing of candidate genes of dyslipidemia in Punjabi Sikhs: Population-specific rare variants in GCKR promote ectopic fat deposition. PLoS One 2019; 14:e0211661. [PMID: 31369557 PMCID: PMC6675050 DOI: 10.1371/journal.pone.0211661] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/28/2019] [Indexed: 12/18/2022] Open
Abstract
Dyslipidemia is a well-established risk factor for cardiovascular diseases. Although, advances in genome-wide technologies have enabled the discovery of hundreds of genes associated with blood lipid phenotypes, most of the heritability remains unexplained. Here we performed targeted resequencing of 13 bona fide candidate genes of dyslipidemia to identify the underlying biological functions. We sequenced 940 Sikh subjects with extreme serum levels of hypertriglyceridemia (HTG) and 2,355 subjects were used for replication studies; all 3,295 participants were part of the Asian Indians Diabetic Heart Study. Gene-centric analysis revealed burden of variants for increasing HTG risk in GCKR (p = 2.1x10-5), LPL (p = 1.6x10-3) and MLXIPL (p = 1.6x10-2) genes. Of these, three missense and damaging variants within GCKR were further examined for functional consequences in vivo using a transgenic zebrafish model. All three mutations were South Asian population-specific and were largely absent in other multiethnic populations of Exome Aggregation Consortium. We built different transgenic models of human GCKR with and without mutations and analyzed the effects of dietary changes in vivo. Despite the short-term of feeding, profound phenotypic changes were apparent in hepatocyte histology and fat deposition associated with increased expression of GCKR in response to a high fat diet (HFD). Liver histology of the GCKRmut showed severe fatty metamorphosis which correlated with ~7 fold increase in the mRNA expression in the GCKRmut fish even in the absence of a high fat diet. These findings suggest that functionally disruptive GCKR variants not only increase the risk of HTG but may enhance ectopic lipid/fat storage defects in absence of obesity and HFD. To our knowledge, this is the first transgenic zebrafish model of a putative human disease gene built to accurately assess the influence of genetic changes and their phenotypic consequences in vivo.
Collapse
Affiliation(s)
- Dharambir K. Sanghera
- Department of Pediatrics, Section of Genetics, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
- Department of Pharmaceutical Sciences, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
- Oklahoma Center for Neuroscience, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
- Harold Hamm Diabetes Center, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Ruth Hopkins
- Department of Pediatrics, Section of Genetics, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Megan W. Malone-Perez
- Department of Pediatrics, Section of Pediatric Hematology-Oncology, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Cynthia Bejar
- Department of Pediatrics, Section of Genetics, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Chengcheng Tan
- Department of Pediatrics, Section of Genetics, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Huda Mussa
- Department of Pediatrics, Section of Infectious Diseases, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Paul Whitby
- Department of Pediatrics, Section of Infectious Diseases, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Ben Fowler
- Oklahoma Medical Research Foundation, Imaging Core Facility, Oklahoma City, Oklahoma, United States of America
| | - Chinthapally V. Rao
- Center for Cancer Prevention and Drug Development, Stephenson Cancer Center, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - KarMing A. Fung
- Department of Pathology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, Oklahoma, United States of America
| | - Stan Lightfoot
- Department of Surgery, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, Oklahoma, United States of America
| | - J. Kimble Frazer
- Department of Pediatrics, Section of Pediatric Hematology-Oncology, College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| |
Collapse
|
15
|
Fassio AV, Martins PM, Guimarães SDS, Junior SSA, Ribeiro VS, de Melo-Minardi RC, Silveira SDA. Vermont: a multi-perspective visual interactive platform for mutational analysis. BMC Bioinformatics 2017; 18:403. [PMID: 28929973 PMCID: PMC5606220 DOI: 10.1186/s12859-017-1789-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A huge amount of data about genomes and sequence variation is available and continues to grow on a large scale, which makes experimentally characterizing these mutations infeasible regarding disease association and effects on protein structure and function. Therefore, reliable computational approaches are needed to support the understanding of mutations and their impacts. Here, we present VERMONT 2.0, a visual interactive platform that combines sequence and structural parameters with interactive visualizations to make the impact of protein point mutations more understandable. RESULTS We aimed to contribute a novel visual analytics oriented method to analyze and gain insight on the impact of protein point mutations. To assess the ability of VERMONT to do this, we visually examined a set of mutations that were experimentally characterized to determine if VERMONT could identify damaging mutations and why they can be considered so. CONCLUSIONS VERMONT allowed us to understand mutations by interpreting position-specific structural and physicochemical properties. Additionally, we note some specific positions we believe have an impact on protein function/structure in the case of mutation.
Collapse
Affiliation(s)
- Alexandre V Fassio
- Department of Computer Science, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil. .,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil.
| | - Pedro M Martins
- Department of Computer Science, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil
| | - Samuel da S Guimarães
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| | - Sócrates S A Junior
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| | - Vagner S Ribeiro
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| |
Collapse
|
16
|
Fokas AS, Cole DJ, Ahnert SE, Chin AW. Residue Geometry Networks: A Rigidity-Based Approach to the Amino Acid Network and Evolutionary Rate Analysis. Sci Rep 2016; 6:33213. [PMID: 27623708 PMCID: PMC5021933 DOI: 10.1038/srep33213] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 08/12/2016] [Indexed: 01/23/2023] Open
Abstract
Amino acid networks (AANs) abstract the protein structure by recording the amino acid contacts and can provide insight into protein function. Herein, we describe a novel AAN construction technique that employs the rigidity analysis tool, FIRST, to build the AAN, which we refer to as the residue geometry network (RGN). We show that this new construction can be combined with network theory methods to include the effects of allowed conformal motions and local chemical environments. Importantly, this is done without costly molecular dynamics simulations required by other AAN-related methods, which allows us to analyse large proteins and/or data sets. We have calculated the centrality of the residues belonging to 795 proteins. The results display a strong, negative correlation between residue centrality and the evolutionary rate. Furthermore, among residues with high closeness, those with low degree were particularly strongly conserved. Random walk simulations using the RGN were also successful in identifying allosteric residues in proteins involved in GPCR signalling. The dynamic function of these residues largely remain hidden in the traditional distance-cutoff construction technique. Despite being constructed from only the crystal structure, the results in this paper suggests that the RGN can identify residues that fulfil a dynamical function.
Collapse
Affiliation(s)
- Alexander S. Fokas
- Theory of Condensed Matter Group, Cavendish Laboratory, 19 JJ Thomson Avenue, CB3 0HE, Cambridge, U.K
| | - Daniel J. Cole
- Theory of Condensed Matter Group, Cavendish Laboratory, 19 JJ Thomson Avenue, CB3 0HE, Cambridge, U.K
| | - Sebastian E. Ahnert
- Theory of Condensed Matter Group, Cavendish Laboratory, 19 JJ Thomson Avenue, CB3 0HE, Cambridge, U.K
| | - Alex W. Chin
- Theory of Condensed Matter Group, Cavendish Laboratory, 19 JJ Thomson Avenue, CB3 0HE, Cambridge, U.K
| |
Collapse
|
17
|
Kumar A, Butler BM, Kumar S, Ozkan SB. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine. Curr Opin Struct Biol 2015; 35:135-42. [PMID: 26684487 PMCID: PMC4856467 DOI: 10.1016/j.sbi.2015.11.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Revised: 11/03/2015] [Accepted: 11/05/2015] [Indexed: 01/08/2023]
Abstract
Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine.
Collapse
Affiliation(s)
- Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Brandon M Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, United States; Department of Biology, Temple University, Philadelphia, PA 19122, United States; Center for Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States.
| |
Collapse
|
18
|
Computational approaches to study the effects of small genomic variations. J Mol Model 2015; 21:251. [PMID: 26350246 DOI: 10.1007/s00894-015-2794-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 08/23/2015] [Indexed: 10/23/2022]
Abstract
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.
Collapse
|
19
|
Butler BM, Gerek ZN, Kumar S, Ozkan SB. Conformational dynamics of nonsynonymous variants at protein interfaces reveals disease association. Proteins 2015; 83:428-35. [PMID: 25546381 DOI: 10.1002/prot.24748] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 11/20/2014] [Accepted: 12/10/2014] [Indexed: 12/12/2022]
Abstract
Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease-associated non-synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site-specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non-interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease-associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome-wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease.
Collapse
|
20
|
Soundararajan V, Aravamudan M. Global connectivity of hub residues in Oncoprotein structures encodes genetic factors dictating personalized drug response to targeted Cancer therapy. Sci Rep 2014; 4:7294. [PMID: 25465236 PMCID: PMC4252896 DOI: 10.1038/srep07294] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 11/14/2014] [Indexed: 11/09/2022] Open
Abstract
The efficacy and mechanisms of therapeutic action are largely described by atomic bonds and interactions local to drug binding sites. Here we introduce global connectivity analysis as a high-throughput computational assay of therapeutic action--inspired by the Google page rank algorithm that unearths most "globally connected" websites from the information-dense world wide web (WWW). We execute short timescale (30 ps) molecular dynamics simulations with high sampling frequency (0.01 ps), to identify amino acid residue hubs whose global connectivity dynamics are characteristic of the ligand or mutation associated with the target protein. We find that unexpected allosteric hubs--up to 20 Å from the ATP binding site, but within 5 Å of the phosphorylation site--encode the Gibbs free energy of inhibition (ΔG(inhibition)) for select protein kinase-targeted cancer therapeutics. We further find that clinically relevant somatic cancer mutations implicated in both drug resistance and personalized drug sensitivity can be predicted in a high-throughput fashion. Our results establish global connectivity analysis as a potent assay of protein functional modulation. This sets the stage for unearthing disease-causal exome mutations and motivates forecast of clinical drug response on a patient-by-patient basis. We suggest incorporation of structure-guided genetic inference assays into pharmaceutical and healthcare Oncology workflows.
Collapse
|
21
|
Abstract
BACKGROUND We represent the protein structure of scTIM with a graph-theoretic model. We construct a hierarchical graph with three layers - a top level, a midlevel and a bottom level. The top level graph is a representation of the protein in which its vertices each represent a substructure of the protein. In turn, each substructure of the protein is represented by a graph whose vertices are amino acids. Finally, each amino acid is represented as a graph where the vertices are atoms. We use this representation to model the effects of a mutation on the protein. METHODS There are 19 vertices (substructures) in the top level graph and thus there are 19 distinct graphs at the midlevel. The vertices of each of the 19 graphs at the midlevel represent amino acids. Each amino acid is represented by a graph where the vertices are atoms in the residue structure. All edges are determined by proximity in the protein's 3D structure. The vertices in the bottom level are labelled by the corresponding molecular mass of the atom that it represents. We use graph-theoretic measures that incorporate vertex weights to assign graph based attributes to the amino acid graphs. The attributes of the corresponding amino acids are used as vertex weights for the substructure graphs at the midlevel. Graph-theoretic measures based on vertex weighted graphs are subsequently calculated for each of the midlevel graphs. Finally, the vertices of the top level graph are weighted with attributes of the corresponding substructure graph in the midlevel. RESULTS We can visualize which mutations are more influential than others by using properties such as vertex size to correspond with an increase or decrease in a graph-theoretic measure. Global graph-theoretic measures such as the number of triangles or the number of spanning trees can change as the result. Hence this method provides a way to visualize these global changes resulting from a small, seemingly inconsequential local change. CONCLUSIONS This modelling method provides a novel approach to the visualization of protein structures and the consequences of amino acid deletions, insertions or substitutions and provides a new way to gain insight on the consequences of diseases caused by genetic mutations.
Collapse
Affiliation(s)
- Debra J Knisley
- Department of Mathematics and Statistics, East Tennessee State University, Johnson City, TN 37614 ; Institute for Quantitative Biology, East Tennessee State University, Johnson City, TN 37614
| | - Jeff R Knisley
- Department of Mathematics and Statistics, East Tennessee State University, Johnson City, TN 37614 ; Institute for Quantitative Biology, East Tennessee State University, Johnson City, TN 37614
| |
Collapse
|
22
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|
23
|
Giollo M, Martin AJM, Walsh I, Ferrari C, Tosatto SCE. NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation. BMC Genomics 2014; 15 Suppl 4:S7. [PMID: 25057121 PMCID: PMC4083412 DOI: 10.1186/1471-2164-15-s4-s7] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid growth of un-annotated missense variants poses challenges requiring novel strategies for their interpretation. From the thermodynamic point of view, amino acid changes can lead to a change in the internal energy of a protein and induce structural rearrangements. This is of great relevance for the study of diseases and protein design, justifying the development of prediction methods for variant-induced stability changes. RESULTS Here we propose NeEMO, a tool for the evaluation of stability changes using an effective representation of proteins based on residue interaction networks (RINs). RINs are used to extract useful features describing interactions of the mutant amino acid with its structural environment. Benchmarking shows NeEMO to be very effective, allowing reliable predictions in different parts of the protein such as β-strands and buried residues. Validation on a previously published independent dataset shows that NeEMO has a Pearson correlation coefficient of 0.77 and a standard error of 1 Kcal/mol, outperforming nine recent methods. The NeEMO web server can be freely accessed from URL: http://protein.bio.unipd.it/neemo/. CONCLUSIONS NeEMO offers an innovative and reliable tool for the annotation of amino acid changes. A key contribution are RINs, which can be used for modeling proteins and their interactions effectively. Interestingly, the approach is very general, and can motivate the development of a new family of RIN-based protein structure analyzers. NeEMO may suggest innovative strategies for bioinformatics tools beyond protein stability prediction.
Collapse
|
24
|
Yates CM, Filippis I, Kelley LA, Sternberg MJE. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 2014; 426:2692-701. [PMID: 24810707 PMCID: PMC4087249 DOI: 10.1016/j.jmb.2014.04.026] [Citation(s) in RCA: 180] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 04/23/2014] [Accepted: 04/28/2014] [Indexed: 11/16/2022]
Abstract
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. Bioinformatics approaches are key for identification of disease-causing variants. SAV phenotype prediction can be improved using network information. A method including these features, SuSPect, outperforms tested methods. SuSPect is available to use at www.sbg.bio.ic.ac.uk/suspect.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.
| | - Ioannis Filippis
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
25
|
Abstract
Moving from a traditional medical model of treating pathologies to an individualized predictive and preventive model of personalized medicine promises to reduce the healthcare cost on an overburdened and overwhelmed system. Next-generation sequencing (NGS) has the potential to accelerate the early detection of disorders and the identification of pharmacogenetics markers to customize treatments. This review explains the historical facts that led to the development of NGS along with the strengths and weakness of NGS, with a special emphasis on the analytical aspects used to process NGS data. There are solutions to all the steps necessary for performing NGS in the clinical context where the majority of them are very efficient, but there are some crucial steps in the process that need immediate attention.
Collapse
Affiliation(s)
- Manuel L. Gonzalez-Garay
- Center for Molecular Imaging, Division of Genomics & Bioinformatics, The Brown Foundation Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
26
|
Pires DEV, Ascher DB, Blundell TL. mCSM: predicting the effects of mutations in proteins using graph-based signatures. ACTA ACUST UNITED AC 2013; 30:335-42. [PMID: 24281696 PMCID: PMC3904523 DOI: 10.1093/bioinformatics/btt691] [Citation(s) in RCA: 733] [Impact Index Per Article: 61.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: Mutations play fundamental roles in evolution by introducing diversity into genomes. Missense mutations in structural genes may become either selectively advantageous or disadvantageous to the organism by affecting protein stability and/or interfering with interactions between partners. Thus, the ability to predict the impact of mutations on protein stability and interactions is of significant value, particularly in understanding the effects of Mendelian and somatic mutations on the progression of disease. Here, we propose a novel approach to the study of missense mutations, called mCSM, which relies on graph-based signatures. These encode distance patterns between atoms and are used to represent the protein residue environment and to train predictive models. To understand the roles of mutations in disease, we have evaluated their impacts not only on protein stability but also on protein–protein and protein–nucleic acid interactions. Results: We show that mCSM performs as well as or better than other methods that are used widely. The mCSM signatures were successfully used in different tasks demonstrating that the impact of a mutation can be correlated with the atomic-distance patterns surrounding an amino acid residue. We showed that mCSM can predict stability changes of a wide range of mutations occurring in the tumour suppressor protein p53, demonstrating the applicability of the proposed method in a challenging disease scenario. Availability and implementation: A web server is available at http://structure.bioc.cam.ac.uk/mcsm. Contact:dpires@dcc.ufmg.br; tom@cryst.bioc.cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK and ACRF Rational Drug Discovery Centre and Biota Structural Biology Laboratory, St Vincents Institute of Medical Research, Fitzroy, VIC, 3065, Australia
| | | | | |
Collapse
|
27
|
Nevin Gerek Z, Kumar S, Banu Ozkan S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol Appl 2013; 6:423-33. [PMID: 23745135 PMCID: PMC3673471 DOI: 10.1111/eva.12052] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 01/13/2013] [Indexed: 01/04/2023] Open
Abstract
Protein structures are dynamic entities with a myriad of atomic fluctuations, side-chain rotations, and collective domain movements. Although the importance of these dynamics to proper functioning of proteins is emerging in the studies of many protein families, there is a lack of broad evidence for the critical role of protein dynamics in shaping the biological functions of a substantial fraction of residues for a large number of proteins in the human proteome. Here, we propose a novel dynamic flexibility index (dfi) to quantify the dynamic properties of individual residues in any protein and use it to assess the importance of protein dynamics in 100 human proteins. Our analyses involving functionally critical positions, disease-associated and putatively neutral population variations, and the rate of interspecific substitutions per residue produce concordant patterns at a proteome scale. They establish that the preservation of dynamic properties of residues in a protein structure is critical for maintaining the protein/biological function. Therefore, structural dynamics needs to become a major component of the analysis of protein function and evolution. Such analyses will be facilitated by the dfi, which will also enable the integrative use of structural dynamics with evolutionary conservation in genomic medicine as well as functional genomics investigations.
Collapse
Affiliation(s)
- Zeynep Nevin Gerek
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University Tempe, AZ, USA ; Department of Physics, Center for Biological Physics, Bateman Physical Sciences F-Wing, Arizona State University Tempe, AZ, USA
| | | | | |
Collapse
|
28
|
Udatha DBRKG, Rasmussen S, Sicheritz-Pontén T, Panagiotou G. Targeted metabolic engineering guided by computational analysis of single-nucleotide polymorphisms (SNPs). Methods Mol Biol 2013; 985:409-428. [PMID: 23417815 DOI: 10.1007/978-1-62703-299-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The non-synonymous SNPs, the so-called non-silent SNPs, which are single-nucleotide variations in the coding regions that give "birth" to amino acid mutations, are often involved in the modulation of protein function. Understanding the effect of individual amino acid mutations on a protein/enzyme function or stability is useful for altering its properties for a wide variety of engineering studies. Since measuring the effects of amino acid mutations experimentally is a laborious process, a variety of computational methods have been discussed here that aid to extract direct genotype to phenotype information.
Collapse
Affiliation(s)
- D B R K Gupta Udatha
- Department of Chemical and Biological Engineering, Industrial Biotechnology, Chalmers University of Technology, Gothenburg, Sweden
| | | | | | | |
Collapse
|
29
|
Gültas M, Haubrock M, Tüysüz N, Waack S. Coupled mutation finder: a new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinformatics 2012; 13:225. [PMID: 22963049 PMCID: PMC3577461 DOI: 10.1186/1471-2105-13-225] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 08/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations. RESULTS To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality. CONCLUSIONS The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany.
| | | | | | | |
Collapse
|
30
|
Analyzing effects of naturally occurring missense mutations. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2012; 2012:805827. [PMID: 22577471 PMCID: PMC3346971 DOI: 10.1155/2012/805827] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/01/2012] [Accepted: 02/01/2012] [Indexed: 11/17/2022]
Abstract
Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well.
Collapse
|
31
|
Pasquo A, Consalvi V, Knapp S, Alfano I, Ardini M, Stefanini S, Chiaraluce R. Structural stability of human protein tyrosine phosphatase ρ catalytic domain: effect of point mutations. PLoS One 2012; 7:e32555. [PMID: 22389709 PMCID: PMC3289658 DOI: 10.1371/journal.pone.0032555] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Accepted: 02/01/2012] [Indexed: 01/25/2023] Open
Abstract
Protein tyrosine phosphatase ρ (PTPρ) belongs to the classical receptor type IIB family of protein tyrosine phosphatase, the most frequently mutated tyrosine phosphatase in human cancer. There are evidences to suggest that PTPρ may act as a tumor suppressor gene and dysregulation of Tyr phosphorylation can be observed in diverse diseases, such as diabetes, immune deficiencies and cancer. PTPρ variants in the catalytic domain have been identified in cancer tissues. These natural variants are nonsynonymous single nucleotide polymorphisms, variations of a single nucleotide occurring in the coding region and leading to amino acid substitutions. In this study we investigated the effect of amino acid substitution on the structural stability and on the activity of the membrane-proximal catalytic domain of PTPρ. We expressed and purified as soluble recombinant proteins some of the mutants of the membrane-proximal catalytic domain of PTPρ identified in colorectal cancer and in the single nucleotide polymorphisms database. The mutants show a decreased thermal and thermodynamic stability and decreased activation energy relative to phosphatase activity, when compared to wild- type. All the variants show three-state equilibrium unfolding transitions similar to that of the wild- type, with the accumulation of a folding intermediate populated at ~4.0 M urea.
Collapse
Affiliation(s)
| | - Valerio Consalvi
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, Rome, Italy
| | - Stefan Knapp
- Structural Genomics Consortium, Oxford University, Oxford, England, United Kingdom
| | - Ivan Alfano
- Structural Genomics Consortium, Oxford University, Oxford, England, United Kingdom
| | - Matteo Ardini
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, Rome, Italy
| | - Simonetta Stefanini
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, Rome, Italy
| | - Roberta Chiaraluce
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
32
|
Lopes MC, Joyce C, Ritchie GR, John SL, Cunningham F, Asimit J, Zeggini E. A combined functional annotation score for non-synonymous variants. Hum Hered 2012; 73:47-51. [PMID: 22261837 PMCID: PMC3390741 DOI: 10.1159/000334984] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Accepted: 11/10/2011] [Indexed: 11/19/2022] Open
Abstract
AIMS Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants. METHODS We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls. RESULTS CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage. CONCLUSION The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
Collapse
Affiliation(s)
- Margarida C. Lopes
- Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton CB10 1HH (UK)
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Chris Joyce
- Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton CB10 1HH (UK)
| | | | | | | | - Jennifer Asimit
- Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton CB10 1HH (UK)
| | - Eleftheria Zeggini
- Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton CB10 1HH (UK)
| |
Collapse
|
33
|
Madsen KM, Udatha GDBRK, Semba S, Otero JM, Koetter P, Nielsen J, Ebizuka Y, Kushiro T, Panagiotou G. Linking genotype and phenotype of Saccharomyces cerevisiae strains reveals metabolic engineering targets and leads to triterpene hyper-producers. PLoS One 2011; 6:e14763. [PMID: 21445244 PMCID: PMC3060802 DOI: 10.1371/journal.pone.0014763] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 02/16/2011] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Metabolic engineering is an attractive approach in order to improve the microbial production of drugs. Triterpenes is a chemically diverse class of compounds and many among them are of interest from a human health perspective. A systematic experimental or computational survey of all feasible gene modifications to determine the genotype yielding the optimal triterpene production phenotype is a laborious and time-consuming process. METHODOLOGY/PRINCIPAL FINDINGS Based on the recent genome-wide sequencing of Saccharomyces cerevisiae CEN.PK 113-7D and its phenotypic differences with the S288C strain, we implemented a strategy for the construction of a β-amyrin production platform. The genes Erg8, Erg9 and HFA1 contained non-silent SNPs that were computationally analyzed to evaluate the changes that cause in the respective protein structures. Subsequently, Erg8, Erg9 and HFA1 were correlated with the increased levels of ergosterol and fatty acids in CEN.PK 113-7D and single, double, and triple gene over-expression strains were constructed. CONCLUSIONS The six out of seven gene over-expression constructs had a considerable impact on both ergosterol and β-amyrin production. In the case of β-amyrin formation the triple over-expression construct exhibited a nearly 500% increase over the control strain making our metabolic engineering strategy the most successful design of triterpene microbial producers.
Collapse
Affiliation(s)
- Karina M. Madsen
- Center for Microbial Biotechnology, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Gupta D. B. R. K. Udatha
- Department of Chemical and Biological Engineering, Industrial Biotechnology, Chalmers University of Technology, Gothenburg, Sweden
| | - Saori Semba
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
| | - Jose M. Otero
- Center for Microbial Biotechnology, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
- Department of Chemical and Biological Engineering, Systems Biology, Chalmers University of Technology, Gothenburg, Sweden
| | - Peter Koetter
- Institute for Microbiology, Johann Wolfgang Goethe-University of Frankfurt, Frankfurt, Germany
| | - Jens Nielsen
- Department of Chemical and Biological Engineering, Systems Biology, Chalmers University of Technology, Gothenburg, Sweden
| | - Yutaka Ebizuka
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
| | - Tetsuo Kushiro
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
| | - Gianni Panagiotou
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
- Center for Microbial Biotechnology, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
34
|
Gong S, Worth CL, Cheng TMK, Blundell TL. Meet Me Halfway: When Genomics Meets Structural Bioinformatics. J Cardiovasc Transl Res 2011; 4:281-303. [DOI: 10.1007/s12265-011-9259-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 02/08/2011] [Indexed: 01/08/2023]
|
35
|
Doncheva NT, Klein K, Domingues FS, Albrecht M. Analyzing and visualizing residue networks of protein structures. Trends Biochem Sci 2011; 36:179-82. [PMID: 21345680 DOI: 10.1016/j.tibs.2011.01.002] [Citation(s) in RCA: 194] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Revised: 01/19/2011] [Accepted: 01/21/2011] [Indexed: 11/27/2022]
Abstract
The study of individual amino acid residues and their molecular interactions in protein structures is crucial for understanding structure-function relationships. Recent work has indicated that residue networks derived from 3D protein structures provide additional insights into the structural and functional roles of interacting residues. Here, we present the new software tools RINerator and RINalyzer for the automatized generation, 2D visualization, and interactive analysis of residue interaction networks, and highlight their use in different application scenarios.
Collapse
Affiliation(s)
- Nadezhda T Doncheva
- Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany
| | | | | | | |
Collapse
|
36
|
Li Y, Wen Z, Xiao J, Yin H, Yu L, Yang L, Li M. Predicting disease-associated substitution of a single amino acid by analyzing residue interactions. BMC Bioinformatics 2011; 12:14. [PMID: 21223604 PMCID: PMC3027113 DOI: 10.1186/1471-2105-12-14] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 01/12/2011] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues. RESULTS We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively. CONCLUSIONS The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.
Collapse
Affiliation(s)
- Yizhou Li
- Key Laboratory of Green Chemistry and Technology, Ministry of Education, College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | |
Collapse
|
37
|
Phenotype prediction of nonsynonymous single nucleotide polymorphisms in human phase II drug/xenobiotic metabolizing enzymes: perspectives on molecular evolution. SCIENCE CHINA-LIFE SCIENCES 2010; 53:1252-62. [DOI: 10.1007/s11427-010-4062-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2010] [Accepted: 05/27/2010] [Indexed: 12/18/2022]
|
38
|
A survey of proteins encoded by non-synonymous single nucleotide polymorphisms reveals a significant fraction with altered stability and activity. Biochem J 2009; 424:15-26. [DOI: 10.1042/bj20090723] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
On average, each human gene has approximately four SNPs (single nucleotide polymorphisms) in the coding region, half of which are nsSNPs (non-synonymous SNPs) or missense SNPs. Current attention is focused on those that are known to perturb function and are strongly linked to disease. However, the vast majority of SNPs have not been investigated for the possibility of causing disease. We set out to assess the fraction of nsSNPs that encode proteins that have altered stability and activity, for this class of variants would be candidates to perturb cellular function. We tested the thermostability and, where possible, the catalytic activity for the most common variant (wild-type) and minor variants (total of 46 SNPs) for 16 human enzymes for which the three-dimensional structures were known. There were significant differences in the stability of almost half of the variants (48%) compared with their wild-type counterparts. The catalytic efficiency of approx. 14 variants was significantly altered, including several variants of human PKM2 (pyruvate kinase muscle 2). Two PKM2 variants, S437Y and E28K, also exhibited changes in their allosteric regulation compared with the wild-type enzyme. The high proportion of nsSNPs that affect protein stability and function, albeit subtly, underscores the need for experimental analysis of the diverse human proteome.
Collapse
|
39
|
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 2009; 30:1237-44. [PMID: 19514061 DOI: 10.1002/humu.21047] [Citation(s) in RCA: 485] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the simplest and most frequent form of human DNA variation, also valuable as genetic markers of disease susceptibility. The most investigated SNPs are missense mutations resulting in residue substitutions in the protein. Here we propose SNPs&GO, an accurate method that, starting from a protein sequence, can predict whether a mutation is disease related or not by exploiting the protein functional annotation. The scoring efficiency of SNPs&GO is as high as 82%, with a Matthews correlation coefficient equal to 0.63 over a wide set of annotated nonsynonymous mutations in proteins, including 16,330 disease-related and 17,432 neutral polymorphisms. SNPs&GO collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms, and outperforms other available predictive methods.
Collapse
Affiliation(s)
- Remo Calabrese
- Laboratory of Biocomputing, CIRB/Department of Biology, University of Bologna, Bologna 40126, Italy
| | | | | | | | | |
Collapse
|
40
|
Kumar S, Suleski MP, Markov GJ, Lawrence S, Marco A, Filipski AJ. Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations. Genome Res 2009; 19:1562-9. [PMID: 19546171 PMCID: PMC2752122 DOI: 10.1101/gr.091991.109] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2009] [Accepted: 06/08/2009] [Indexed: 11/25/2022]
Abstract
As the cost of DNA sequencing drops, we are moving beyond one genome per species to one genome per individual to improve prevention, diagnosis, and treatment of disease by using personal genotypes. Computational methods are frequently applied to predict impairment of gene function by nonsynonymous mutations in individual genomes and single nucleotide polymorphisms (nSNPs) in populations. These computational tools are, however, known to fail 15%-40% of the time. We find that accurate discrimination between benign and deleterious mutations is strongly influenced by the long-term (among species) history of positions that harbor those mutations. Successful prediction of known disease-associated mutations (DAMs) is much higher for evolutionarily conserved positions and for original-mutant amino acid pairs that are rarely seen among species. Prediction accuracies for nSNPs show opposite patterns, forecasting impediments to building diagnostic tools aiming to simultaneously reduce both false-positive and false-negative errors. The relative allele frequencies of mutations diagnosed as benign and damaging are predicted by positional evolutionary rates. These allele frequencies are modulated by the relative preponderance of the mutant allele in the set of amino acids found at homologous sites in other species (evolutionarily permissible alleles [EPAs]). The nSNPs found in EPAs are biochemically less severe than those missing from EPAs across all allele frequency categories. Therefore, it is important to consider position evolutionary rates and EPAs when interpreting the consequences and population frequencies of human mutations. The impending sequencing of thousands of human and many more vertebrate genomes will lead to more accurate classifiers needed in real-world applications.
Collapse
Affiliation(s)
- Sudhir Kumar
- Center for Evolutionary Functional Genomics, Biodesign Institute, Arizona State University, Tempe, Arizona 85287-5301, USA.
| | | | | | | | | | | |
Collapse
|
41
|
Bromberg Y, Rost B. Correlating protein function and stability through the analysis of single amino acid substitutions. BMC Bioinformatics 2009; 10 Suppl 8:S8. [PMID: 19758472 PMCID: PMC2745590 DOI: 10.1186/1471-2105-10-s8-s8] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background Mutations resulting in the disruption of protein function are the underlying causes of many genetic diseases. Some mutations affect the number of expressed proteins while others alter the activity on a per-molecule basis. Single amino acid substitutions as caused by non-synonymous Single Nucleotide Polymorphisms (nsSNPs) often disrupt function by altering protein structure and/or stability, but can also wreak havoc by directly impacting functional binding sites. Given the experimental three-dimensional (3D) structure of a protein, we can try to differentiate between the "effect on structure/stability" and the "effect on binding". However, experimental 3D structures are available for only 1% of all known proteins; the magnitude of stability change caused by a given mutation is more widely available. Results Here, we analyze to which extent the functional effect of a mutation can be predicted from the effect on protein stability. We find that simple sequence-based methods succeed in predicting functional effects of nsSNPs. In fact, such methods consistently outperform approaches that predict functional change through the application of binary thresholds to stability change. We also observed that if stability is affected, functional change is easier to predict than when stability is not affected. Conclusion Our results confirmed that stability change is somehow related to function change. However, we also show that the knowledge of stability changes in no way suffices to predict functional changes and that many function changing mutations have no effect on stability.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | |
Collapse
|
42
|
Suárez M, Tortosa P, Jaramillo A. PROTDES: CHARMM toolbox for computational protein design. SYSTEMS AND SYNTHETIC BIOLOGY 2009; 2:105-13. [PMID: 19572216 PMCID: PMC2735645 DOI: 10.1007/s11693-009-9026-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Revised: 05/17/2009] [Accepted: 05/30/2009] [Indexed: 12/13/2022]
Abstract
We present an open-source software able to automatically mutate any residue positions and find the best aminoacids in an arbitrary protein structure without requiring pairwise approximations. Our software, PROTDES, is based on CHARMM and it searches automatically for mutations optimizing a protein folding free energy. PROTDES allows the integration of molecular dynamics within the protein design. We have implemented an heuristic optimization algorithm that iteratively searches the best aminoacids and their conformations for an arbitrary set of positions within a structure. Our software allows CHARMM users to perform protein design calculations and to create their own procedures for protein design using their own energy functions. We show this by implementing three different energy functions based on different solvent treatments: surface area accessibility, generalized Born using molecular volume and an effective energy function. PROTDES, a tutorial, parameter sets, configuration tools and examples are freely available at http://soft.synth-bio.org/protdes.html.
Collapse
Affiliation(s)
- María Suárez
- Biochemistry Laboratory, CNRS—UMR 7654, Ecole Polytechnique, 91128 Palaiseau, France
- SYNTH-BIO group Epigenomics Project, Genopole Tour Evry2, etage 10, 523, Terrasses de l’Agora, 91034 Evry Cedex, France
| | - Pablo Tortosa
- Biochemistry Laboratory, CNRS—UMR 7654, Ecole Polytechnique, 91128 Palaiseau, France
| | - Alfonso Jaramillo
- Biochemistry Laboratory, CNRS—UMR 7654, Ecole Polytechnique, 91128 Palaiseau, France
- SYNTH-BIO group Epigenomics Project, Genopole Tour Evry2, etage 10, 523, Terrasses de l’Agora, 91034 Evry Cedex, France
| |
Collapse
|
43
|
Teng S, Madej T, Panchenko A, Alexov E. Modeling effects of human single nucleotide polymorphisms on protein-protein interactions. Biophys J 2009; 96:2178-88. [PMID: 19289044 DOI: 10.1016/j.bpj.2008.12.3904] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 11/08/2008] [Accepted: 12/03/2008] [Indexed: 12/25/2022] Open
Abstract
A large set of three-dimensional structures of 264 protein-protein complexes with known nonsynonymous single nucleotide polymorphisms (nsSNPs) at the interface was built using homology-based methods. The nsSNPs were mapped on the proteins' structures and their effect on the binding energy was investigated with CHARMM force field and continuum electrostatic calculations. Two sets of nsSNPs were studied: disease annotated Online Mendelian Inheritance in Man (OMIM) and nonannotated (non-OMIM). It was demonstrated that OMIM nsSNPs tend to destabilize the electrostatic component of the binding energy, in contrast with the effect of non-OMIM nsSNPs. In addition, it was shown that the change of the binding energy upon amino acid substitutions is not related to the conservation of the net charge, hydrophobicity, or hydrogen bond network at the interface. The results indicate that, generally, the effect of nsSNPs on protein-protein interactions cannot be predicted from amino acids' physico-chemical properties alone, since in many cases a substitution of a particular residue with another amino acid having completely different polarity or hydrophobicity had little effect on the binding energy. Analysis of sequence conservation showed that nsSNP at highly conserved positions resulted in a large variance of the binding energy changes. In contrast, amino acid substitutions corresponding to nsSNPs at nonconserved positions, on average, were not found to have a large effect on binding affinity. pKa calculations were performed and showed that amino acid substitutions could change the wild-type proton uptake/release and thus resulting in different pH-dependence of the binding energy.
Collapse
Affiliation(s)
- Shaolei Teng
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, USA
| | | | | | | |
Collapse
|
44
|
Pang GSY, Wang J, Wang Z, Lee CGL. Predicting potentially functional SNPs in drug-response genes. Pharmacogenomics 2009; 10:639-53. [DOI: 10.2217/pgs.09.12] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
SNPs are known to contribute to variations in drug response and there are more than 14 million polymorphisms spanning the human genome. However, not all of these SNPs are functional. It would be impractical and costly to evaluate every individual SNP for functionality experimentally. Consequently, one of the major challenges for researchers has been to seek out functional SNPs from all the SNPs in the human genome. In silico or bioinformatic methods are economical, less labor intensive, yet powerful approaches to filter out potentially functional SNPs in drug-response genes for further study. This allows researchers to prioritize which SNPs to subsequently evaluate experimentally for drug-response studies, as well as potentially providing insights into possible mechanisms underlying how SNPs may affect drug-response genes.
Collapse
Affiliation(s)
- Grace SY Pang
- Division of Medical Sciences, National Cancer Center, Level 6, Lab 5, 11 Hospital Drive, Singapore 169610, Singapore
| | | | - Zihua Wang
- Division of Medical Sciences, National Cancer Center, Level 6, Lab 5, 11 Hospital Drive, Singapore 169610, Singapore
- National University of Singapore, Singapore
| | - Caroline GL Lee
- Division of Medical Sciences, National Cancer Center, Level 6, Lab 5, 11 Hospital Drive, Singapore 169610, Singapore
- National University of Singapore, Singapore
- DUKE-NUS Graduate Medical School, Singapore
| |
Collapse
|