1
|
Swint-Kruse L, Martin TA, Wu T, Dougherty LL, Fenton AW. Identification of positions in human aldolase a that are neutral for apparent K M. Arch Biochem Biophys 2024; 761:110183. [PMID: 39461494 PMCID: PMC11908651 DOI: 10.1016/j.abb.2024.110183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/16/2024] [Accepted: 10/20/2024] [Indexed: 10/29/2024]
Abstract
According to evolutionary theory, many naturally-occurring amino acid substitutions are expected to be neutral or near-neutral, with little effect on protein structure or function. Accordingly, most changes observed in human exomes are also expected to be neutral. As such, accurate algorithms for identifying medically-relevant changes must discriminate rare, non-neutral substitutions against a background of neutral substitutions. However, due to historical biases in biochemical experiments, the data available to train and validate prediction algorithms mostly contains non-neutral substitutions, with few examples of neutral substitutions. Thus, available training sets have the opposite composition of the desired test sets. Towards improving a dataset of these critical negative controls, we have concentrated on identifying neutral positions - those positions for which most of the possible 19 amino acid substitutions have little effect on protein structure or function. Here, we used a strategy based on multiple sequence alignments to identify putative neutral positions in human aldolase A, followed by biochemical assays for 147 aldolase substitutions. Results showed that most variants had little effect on either the apparent Michaelis constant for substrate fructose-1,6-bisphosphate or its apparent cooperativity. Thus, these data are useful for training and validating prediction algorithms. In addition, we created a database of these and other biochemically characterized aldolase variants along with aldolase sequences and characteristics derived from sequence and structure analyses. This database is publicly available at https://github.com/liskinsk/Aldolase-variant-and-sequence-database.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA.
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA.
| |
Collapse
|
2
|
Gardner S, Darrow MC, Lukoyanova N, Thalassinos K, Saibil HR. Structural basis of substrate progression through the bacterial chaperonin cycle. Proc Natl Acad Sci U S A 2023; 120:e2308933120. [PMID: 38064510 PMCID: PMC10723157 DOI: 10.1073/pnas.2308933120] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 10/20/2023] [Indexed: 12/17/2023] Open
Abstract
The bacterial chaperonin GroEL-GroES promotes protein folding through ATP-regulated cycles of substrate protein binding, encapsulation, and release. Here, we have used cryoEM to determine structures of GroEL, GroEL-ADP·BeF3, and GroEL-ADP·AlF3-GroES all complexed with the model substrate Rubisco. Our structures provide a series of snapshots that show how the conformation and interactions of non-native Rubisco change as it proceeds through the GroEL-GroES reaction cycle. We observe specific charged and hydrophobic GroEL residues forming strong initial contacts with non-native Rubisco. Binding of ATP or ADP·BeF3 to GroEL-Rubisco results in the formation of an intermediate GroEL complex displaying striking asymmetry in the ATP/ADP·BeF3-bound ring. In this ring, four GroEL subunits bind Rubisco and the other three are in the GroES-accepting conformation, suggesting how GroEL can recruit GroES without releasing bound substrate. Our cryoEM structures of stalled GroEL-ADP·AlF3-Rubisco-GroES complexes show Rubisco folding intermediates interacting with GroEL-GroES via different sets of residues.
Collapse
Affiliation(s)
- Scott Gardner
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, LondonWC1E 7HX, United Kingdom
| | | | - Natalya Lukoyanova
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, LondonWC1E 7HX, United Kingdom
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, LondonWC1E 7HX, United Kingdom
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, LondonWC1E 6BT, United Kingdom
| | - Helen R. Saibil
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, LondonWC1E 7HX, United Kingdom
| |
Collapse
|
3
|
Adamoski D, Dias MM, Quesñay JEN, Yang Z, Zagoriy I, Steyer AM, Rodrigues CT, da Silva Bastos AC, da Silva BN, Costa RKE, de Abreu FMO, Islam Z, Cassago A, van Heel MG, Consonni SR, Mattei S, Mahamid J, Portugal RV, Ambrosio ALB, Dias SMG. Molecular mechanism of glutaminase activation through filamentation and the role of filaments in mitophagy protection. Nat Struct Mol Biol 2023; 30:1902-1912. [PMID: 37857822 DOI: 10.1038/s41594-023-01118-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 09/06/2023] [Indexed: 10/21/2023]
Abstract
Glutaminase (GLS), which deaminates glutamine to form glutamate, is a mitochondrial tetrameric protein complex. Although inorganic phosphate (Pi) is known to promote GLS filamentation and activation, the molecular basis of this mechanism is unknown. Here we aimed to determine the molecular mechanism of Pi-induced mouse GLS filamentation and its impact on mitochondrial physiology. Single-particle cryogenic electron microscopy revealed an allosteric mechanism in which Pi binding at the tetramer interface and the activation loop is coupled to direct nucleophile activation at the active site. The active conformation is prone to enzyme filamentation. Notably, human GLS filaments form inside tubulated mitochondria following glutamine withdrawal, as shown by in situ cryo-electron tomography of cells thinned by cryo-focused ion beam milling. Mitochondria with GLS filaments exhibit increased protection from mitophagy. We reveal roles of filamentous GLS in mitochondrial morphology and recycling.
Collapse
Affiliation(s)
- Douglas Adamoski
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
| | - Marilia Meira Dias
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
- Cancer Research UK Beatson Institute, Glasgow, UK
| | - Jose Edwin Neciosup Quesñay
- Sao Carlos Institute of Physics, University of Sao Paulo, Sao Carlos, Brazil
- Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Zhengyi Yang
- EMBL Imaging Centre, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Ievgeniia Zagoriy
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Anna M Steyer
- EMBL Imaging Centre, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Camila Tanimoto Rodrigues
- Sao Carlos Institute of Physics, University of Sao Paulo, Sao Carlos, Brazil
- Biological Sciences Department, School of Science, Purdue University, Lafayette, IN, USA
| | - Alliny Cristiny da Silva Bastos
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
- Division of Medical Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Bianca Novaes da Silva
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
- Graduate Program in Genetics and Molecular Biology, Institute of Biology, University of Campinas, Campinas, Brazil
| | - Renna Karoline Eloi Costa
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | | | - Zeyaul Islam
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
- Diabetes Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Alexandre Cassago
- Brazilian Nanotechnology National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
- SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA, USA
| | - Marin Gerard van Heel
- Brazilian Nanotechnology National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
| | - Sílvio Roberto Consonni
- Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, Campinas, Brazil
| | - Simone Mattei
- EMBL Imaging Centre, European Molecular Biology Laboratory, Heidelberg, Germany
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Julia Mahamid
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Rodrigo Villares Portugal
- Brazilian Nanotechnology National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil
| | | | - Sandra Martha Gomes Dias
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, Brazil.
| |
Collapse
|
4
|
Musil M, Jezik A, Horackova J, Borko S, Kabourek P, Damborsky J, Bednar D. FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Brief Bioinform 2023; 25:bbad425. [PMID: 38018911 PMCID: PMC10685400 DOI: 10.1093/bib/bbad425] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/25/2023] [Accepted: 11/01/2023] [Indexed: 11/30/2023] Open
Abstract
Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins often results in single-point mutations with a limited effect on protein stability. However, the construction of stable multiple-point mutants can prove difficult due to the possibility of antagonistic effects between individual mutations. FireProt protocol enables the automated computational design of highly stable multiple-point mutants. FireProt 2.0 builds on top of the previously published FireProt web, retaining the original functionality and expanding it with several new stabilization strategies. FireProt 2.0 integrates the AlphaFold database and the homology modeling for structure prediction, enabling calculations starting from a sequence. Multiple-point designs are constructed using the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations. Users can newly limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein or select rigidifying mutations based on B-factors. Evolution-based back-to-consensus strategy is complemented by ancestral sequence reconstruction. FireProt 2.0 is significantly faster and a reworked graphical user interface broadens the tool's availability even to users with older hardware. FireProt 2.0 is freely available at http://loschmidt.chemi.muni.cz/fireprotweb.
Collapse
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Andrej Jezik
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jana Horackova
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
| | - Simeon Borko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Petr Kabourek
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
5
|
Manoussopoulos Y, Anastassopoulou C, Ioannidis JPA, Tsakris A. Paired associated SARS-CoV-2 spike variable positions: a network analysis approach to emerging variants. mSystems 2023; 8:e0044023. [PMID: 37432011 PMCID: PMC10469592 DOI: 10.1128/msystems.00440-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 06/01/2023] [Indexed: 07/12/2023] Open
Abstract
Amino acids in variable positions of proteins may be correlated, with potential structural and functional implications. Here, we apply exact tests of independence in R × C contingency tables to examine noise-free associations between variable positions of the SARS-CoV-2 spike protein, using as a paradigm sequences from Greece deposited in GISAID (N = 6,683/1,078 full length) for the period 29 February 2020 to 26 April 2021 that essentially covers the first three pandemic waves. We examine the fate and complexity of these associations by network analysis, using associated positions (exact P ≤ 0.001 and Average Product Correction ≥ 2) as links and the corresponding positions as nodes. We found a temporal linear increase of positional differences and a gradual expansion of the number of position associations over time, represented by a temporally evolving intricate web, resulting in a non-random complex network of 69 nodes and 252 links. Overconnected nodes corresponded to the most adapted variant positions in the population, suggesting a direct relation between network degree and position functional importance. Modular analysis revealed 25 k-cliques comprising 3 to 11 nodes. At different k-clique resolutions, one to four communities were formed, capturing epistatic associations of circulating variants (Alpha, Beta, B.1.1.318), but also Delta, which dominated the evolutionary landscape later in the pandemic. Cliques of aminoacidic positional associations tended to occur in single sequences, enabling the recognition of epistatic positions in real-world virus populations. Our findings provide a novel way of understanding epistatic relationships in viral proteins with potential applications in the design of virus control procedures. IMPORTANCE Paired positional associations of adapted amino acids in virus proteins may provide new insights for understanding virus evolution and variant formation. We investigated potential intramolecular relationships between variable SARS-CoV-2 spike positions by exact tests of independence in R × C contingency tables, having applied Average Product Correction (APC) to eliminate background noise. Associated positions (exact P ≤ 0.001 and APC ≥ 2) formed a non-random, epistatic network of 25 cliques and 1-4 communities at different clique resolutions, revealing evolutionary ties between variable positions of circulating variants and a predictive potential of previously unknown network positions. Cliques of different sizes represented theoretical combinations of changing residues in sequence space, allowing the identification of significant aminoacidic combinations in single sequences of real-world populations. Our analytic approach that links network structural aspects to mutational aminoacidic combinations in the spike sequence population offers a novel way to understand virus epidemiology and evolution.
Collapse
Affiliation(s)
- Yiannis Manoussopoulos
- Department of Microbiology, Medical School, National and Kapodistrian University of Athens, Athens, Greece
- ELGO-Demeter, Plant Protection Division of Patras, Laboratory of Virology, Patras, Greece
| | - Cleo Anastassopoulou
- Department of Microbiology, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - John P. A. Ioannidis
- Department of Medicine, Stanford University, Stanford, California, USA
- Departments of Epidemiology and Population Health, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
- Department of Statistics, Stanford University, Stanford, California, USA
| | - Athanasios Tsakris
- Department of Microbiology, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| |
Collapse
|
6
|
Stan G, Lorimer GH, Thirumalai D. Friends in need: How chaperonins recognize and remodel proteins that require folding assistance. Front Mol Biosci 2022; 9:1071168. [PMID: 36479385 PMCID: PMC9720267 DOI: 10.3389/fmolb.2022.1071168] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 08/19/2023] Open
Abstract
Chaperonins are biological nanomachines that help newly translated proteins to fold by rescuing them from kinetically trapped misfolded states. Protein folding assistance by the chaperonin machinery is obligatory in vivo for a subset of proteins in the bacterial proteome. Chaperonins are large oligomeric complexes, with unusual seven fold symmetry (group I) or eight/nine fold symmetry (group II), that form double-ring constructs, enclosing a central cavity that serves as the folding chamber. Dramatic large-scale conformational changes, that take place during ATP-driven cycles, allow chaperonins to bind misfolded proteins, encapsulate them into the expanded cavity and release them back into the cellular environment, regardless of whether they are folded or not. The theory associated with the iterative annealing mechanism, which incorporated the conformational free energy landscape description of protein folding, quantitatively explains most, if not all, the available data. Misfolded conformations are associated with low energy minima in a rugged energy landscape. Random disruptions of these low energy conformations result in higher free energy, less folded, conformations that can stochastically partition into the native state. Two distinct mechanisms of annealing action have been described. Group I chaperonins (GroEL homologues in eubacteria and endosymbiotic organelles), recognize a large number of misfolded proteins non-specifically and operate through highly coordinated cooperative motions. By contrast, the less well understood group II chaperonins (CCT in Eukarya and thermosome/TF55 in Archaea), assist a selected set of substrate proteins. Sequential conformational changes within a CCT ring are observed, perhaps promoting domain-by-domain substrate folding. Chaperonins are implicated in bacterial infection, autoimmune disease, as well as protein aggregation and degradation diseases. Understanding the chaperonin mechanism and the specific proteins they rescue during the cell cycle is important not only for the fundamental aspect of protein folding in the cellular environment, but also for effective therapeutic strategies.
Collapse
Affiliation(s)
- George Stan
- Department of Chemistry, University of Cincinnati, Cincinnati, OH, United States
| | - George H. Lorimer
- Center for Biomolecular Structure and Organization, Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, United States
| | - D. Thirumalai
- Department of Chemistry, University of Texas, Austin, TX, United States
- Department of Physics, University of Texas, Austin, TX, United States
| |
Collapse
|
7
|
Swint-Kruse L, Martin TA, Page BM, Wu T, Gerhart PM, Dougherty LL, Tang Q, Parente DJ, Mosier BR, Bantis LE, Fenton AW. Rheostat functional outcomes occur when substitutions are introduced at nonconserved positions that diverge with speciation. Protein Sci 2021; 30:1833-1853. [PMID: 34076313 DOI: 10.1002/pro.4136] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 12/14/2022]
Abstract
When amino acids vary during evolution, the outcome can be functionally neutral or biologically-important. We previously found that substituting a subset of nonconserved positions, "rheostat" positions, can have surprising effects on protein function. Since changes at rheostat positions can facilitate functional evolution or cause disease, more examples are needed to understand their unique biophysical characteristics. Here, we explored whether "phylogenetic" patterns of change in multiple sequence alignments (such as positions with subfamily specific conservation) predict the locations of functional rheostat positions. To that end, we experimentally tested eight phylogenetic positions in human liver pyruvate kinase (hLPYK), using 10-15 substitutions per position and biochemical assays that yielded five functional parameters. Five positions were strongly rheostatic and three were non-neutral. To test the corollary that positions with low phylogenetic scores were not rheostat positions, we combined these phylogenetic positions with previously-identified hLPYK rheostat, "toggle" (most substitution abolished function), and "neutral" (all substitutions were like wild-type) positions. Despite representing 428 variants, this set of 33 positions was poorly statistically powered. Thus, we turned to the in vivo phenotypic dataset for E. coli lactose repressor protein (LacI), which comprised 12-13 substitutions at 329 positions and could be used to identify rheostat, toggle, and neutral positions. Combined hLPYK and LacI results show that positions with strong phylogenetic patterns of change are more likely to exhibit rheostat substitution outcomes than neutral or toggle outcomes. Furthermore, phylogenetic patterns were more successful at identifying rheostat positions than were co-evolutionary or eigenvector centrality measures of evolutionary change.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Braelyn M Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Paige M Gerhart
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Family Medicine and Community Health, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Brian R Mosier
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Leonidas E Bantis
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
8
|
Pearce R, Zhang Y. Deep learning techniques have significantly impacted protein structure prediction and protein design. Curr Opin Struct Biol 2021; 68:194-207. [PMID: 33639355 PMCID: PMC8222070 DOI: 10.1016/j.sbi.2021.01.007] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 01/09/2021] [Accepted: 01/18/2021] [Indexed: 12/26/2022]
Abstract
Protein structure prediction and design can be regarded as two inverse processes governed by the same folding principle. Although progress remained stagnant over the past two decades, the recent application of deep neural networks to spatial constraint prediction and end-to-end model training has significantly improved the accuracy of protein structure prediction, largely solving the problem at the fold level for single-domain proteins. The field of protein design has also witnessed dramatic improvement, where noticeable examples have shown that information stored in neural-network models can be used to advance functional protein design. Thus, incorporation of deep learning techniques into different steps of protein folding and design approaches represents an exciting future direction and should continue to have a transformative impact on both fields.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
9
|
Li Y, Hu J, Zhang C, Yu DJ, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2020; 35:4647-4655. [PMID: 31070716 DOI: 10.1093/bioinformatics/btz291] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Revised: 03/18/2019] [Accepted: 04/17/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Contact-map of a protein sequence dictates the global topology of structural fold. Accurate prediction of the contact-map is thus essential to protein 3D structure prediction, which is particularly useful for the protein sequences that do not have close homology templates in the Protein Data Bank. RESULTS We developed a new method, ResPRE, to predict residue-level protein contacts using inverse covariance matrix (or precision matrix) of multiple sequence alignments (MSAs) through deep residual convolutional neural network training. The approach was tested on a set of 158 non-homologous proteins collected from the CASP experiments and achieved an average accuracy of 50.6% in the top-L long-range contact prediction with L being the sequence length, which is 11.7% higher than the best of other state-of-the-art approaches ranging from coevolution coupling analysis to deep neural network training. Detailed data analyses show that the major advantage of ResPRE lies at the utilization of precision matrix that helps rule out transitional noises of contact-maps compared with the previously used covariance matrix. Meanwhile, the residual network with parallel shortcut layer connections increases the learning ability of deep neural network training. It was also found that appropriate collection of MSAs can further improve the accuracy of final contact-map predictions. The standalone package and online server of ResPRE are made freely available, which should bring important impact on protein structure and function modeling studies in particular for the distant- and non-homology protein targets. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/ResPRE and https://github.com/leeyang/ResPRE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| |
Collapse
|
10
|
Martin TA, Wu T, Tang Q, Dougherty LL, Parente DJ, Swint-Kruse L, Fenton AW. Identification of biochemically neutral positions in liver pyruvate kinase. Proteins 2020; 88:1340-1350. [PMID: 32449829 DOI: 10.1002/prot.25953] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/10/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023]
Abstract
Understanding how each residue position contributes to protein function has been a long-standing goal in protein science. Substitution studies have historically focused on conserved protein positions. However, substitutions of nonconserved positions can also modify function. Indeed, we recently identified nonconserved positions that have large substitution effects in human liver pyruvate kinase (hLPYK), including altered allosteric coupling. To facilitate a comparison of which characteristics determine when a nonconserved position does vs does not contribute to function, the goal of the current work was to identify neutral positions in hLPYK. However, existing hLPYK data showed that three features commonly associated with neutral positions-high sequence entropy, high surface exposure, and alanine scanning-lacked the sensitivity needed to guide experimental studies. We used multiple evolutionary patterns identified in a sequence alignment of the PYK family to identify which positions were least patterned, reasoning that these were most likely to be neutral. Nine positions were tested with a total of 117 amino acid substitutions. Although exploring all potential functions is not feasible for any protein, five parameters associated with substrate/effector affinities and allosteric coupling were measured for hLPYK variants. For each position, the aggregate functional outcomes of all variants were used to quantify a "neutrality" score. Three positions showed perfect neutral scores for all five parameters. Furthermore, the nine positions showed larger neutral scores than 17 positions located near allosteric binding sites. Thus, our strategy successfully enriched the dataset for positions with neutral and modest substitutions.
Collapse
Affiliation(s)
- Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Family and Community Medicine, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
11
|
Fang C, Jia Y, Hu L, Lu Y, Wang H. IMPContact: An Interhelical Residue Contact Prediction Method. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4569037. [PMID: 32309431 PMCID: PMC7140131 DOI: 10.1155/2020/4569037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]
Abstract
As an important category of proteins, alpha-helix transmembrane proteins (αTMPs) play an important role in various biological activities. Because the solved αTMP structures are inadequate, predicting the residue contacts among the transmembrane segments of an αTMP exhibits the basis of protein fold, which can be used to further discover more protein functions. A few efforts have been devoted to predict the interhelical residue contact using machine learning methods based on the prior knowledge of transmembrane protein structure. However, it is still a challenge to improve the prediction accuracy, while the deep learning method provides an opportunity to utilize the structural knowledge in a different insight. For this purpose, we proposed a novel αTMP residue-residue contact prediction method IMPContact, in which a convolutional neural network (CNN) was applied to recognize those interhelical contacts in a TMP using its specific structural features. There were four sequence-based TMP-specific features selected to descript a pair of residues, namely, evolutionary covariation, predicted topology structure, residue relative position, and evolutionary conservation. An up-to-date dataset was used to train and test the IMPContact; our method achieved better performance compared to peer methods. In the case studies, IHRCs in the regular transmembrane helixes were better predicted than in the irregular ones.
Collapse
Affiliation(s)
- Chao Fang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yajie Jia
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| | - Lihong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yinghua Lu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
12
|
Ivey G, Youker RT. Disease-relevant mutations alter amino acid co-evolution networks in the second nucleotide binding domain of CFTR. PLoS One 2020; 15:e0227668. [PMID: 31978131 PMCID: PMC6980524 DOI: 10.1371/journal.pone.0227668] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2018] [Accepted: 12/25/2019] [Indexed: 01/23/2023] Open
Abstract
Cystic Fibrosis (CF) is an inherited disease caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) ion channel. Mutations in CFTR cause impaired chloride ion transport in the epithelial tissues of patients leading to cardiopulmonary decline and pancreatic insufficiency in the most severely affected patients. CFTR is composed of twelve membrane-spanning domains, two nucleotide-binding domains (NBDs), and a regulatory domain. The most common mutation in CFTR is a deletion of phenylalanine at position 508 (ΔF508) in NBD1. Previous research has primarily concentrated on the structure and dynamics of the NBD1 domain; However numerous pathological mutations have also been found in the lesser-studied NBD2 domain. We have investigated the amino acid co-evolved network of interactions in NBD2, and the changes that occur in that network upon the introduction of CF and CF-related mutations (S1251N(T), S1235R, D1270N, N1303K(T)). Extensive coupling between the α- and β-subdomains were identified with residues in, or near Walker A, Walker B, H-loop and C-loop motifs. Alterations in the predicted residue network varied from moderate for the S1251T perturbation to more severe for N1303T. The S1235R and D1270N networks varied greatly compared to the wildtype, but these CF mutations only affect ion transport preference and do not severely disrupt CFTR function, suggesting dynamic flexibility in the network of interactions in NBD2. Our results also suggest that inappropriate interactions between the β-subdomain and Q-loop could be detrimental. We also identified mutations predicted to stabilize the NBD2 residue network upon introduction of the CF and CF-related mutations, and these predicted mutations are scored as benign by the MUTPRED2 algorithm. Our results suggest the level of disruption of the co-evolution predictions of the amino acid networks in NBD2 does not have a straightforward correlation with the severity of the CF phenotypes observed.
Collapse
Affiliation(s)
- Gabrianne Ivey
- Kyder Christian Academy, Franklin, North Carolina, United States of America
- Southwestern Community College, Sylva, North Carolina, United States of America
| | - Robert T. Youker
- Department of Biology, Western Carolina University, Cullowhee, North Carolina, United States of America
| |
Collapse
|
13
|
Role of protein-protein interactions in allosteric drug design for DNA methyltransferases. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 121:49-84. [PMID: 32312426 DOI: 10.1016/bs.apcsb.2019.12.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
DNA methyltransferases (DNMTs) not only play key roles in epigenetic gene regulation, but also serve as emerging targets for several diseases, especially for cancers. Due to the multi-domains of DNMT structures, targeting allosteric sites of protein-protein interactions (PPIs) is becoming an attractive strategy in epigenetic drug discovery. This chapter aims to review the major contemporary approaches utilized for the drug discovery based on PPIs in different dimensions, from the enumeration of allosteric mechanism to the identification of allosteric pockets. These include the construction of protein structure networks (PSNs) based on molecular dynamics (MD) simulations, performing elastic network models (ENMs) and perturbation response scanning (PRS) calculation, the sequence-based conservation and coupling analysis, and the allosteric pockets identification. Furthermore, we complement this methodology by highlighting the role of computational approaches in promising practical applications for the computer-aided drug design, with special focus on two DNMTs, namely, DNMT1 and DNMT3A.
Collapse
|
14
|
Dimas RP, Jiang XL, Alberto de la Paz J, Morcos F, Chan CTY. Engineering repressors with coevolutionary cues facilitates toggle switches with a master reset. Nucleic Acids Res 2019; 47:5449-5463. [PMID: 31162606 PMCID: PMC6547410 DOI: 10.1093/nar/gkz280] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 04/08/2019] [Indexed: 12/17/2022] Open
Abstract
Engineering allosteric transcriptional repressors containing an environmental sensing module (ESM) and a DNA recognition module (DRM) has the potential to unlock a combinatorial set of rationally designed biological responses. We demonstrated that constructing hybrid repressors by fusing distinct ESMs and DRMs provides a means to flexibly rewire genetic networks for complex signal processing. We have used coevolutionary traits among LacI homologs to develop a model for predicting compatibility between ESMs and DRMs. Our predictions accurately agree with the performance of 40 engineered repressors. We have harnessed this framework to develop a system of multiple toggle switches with a master OFF signal that produces a unique behavior: each engineered biological activity is switched to a stable ON state by different chemicals and returned to OFF in response to a common signal. One promising application of this design is to develop living diagnostics for monitoring multiple parameters in complex physiological environments and it represents one of many circuit topologies that can be explored with modular repressors designed with coevolutionary information.
Collapse
Affiliation(s)
- Rey P Dimas
- Department of Biology, The University of Texas at Tyler, Tyler, TX 75799, USA
| | - Xian-Li Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Dallas, TX 75080, USA
| | - Jose Alberto de la Paz
- Department of Biological Sciences, The University of Texas at Dallas, Dallas, TX 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Dallas, TX 75080, USA.,Department of Bioengineering, The University of Texas at Dallas, Dallas, TX 75080, USA.,Center for Systems Biology, The University of Texas at Dallas, Dallas, TX 75080, USA
| | - Clement T Y Chan
- Department of Biology, The University of Texas at Tyler, Tyler, TX 75799, USA.,Department of Chemistry and Biochemistry, The University of Texas at Tyler, Tyler, TX 75799, USA
| |
Collapse
|
15
|
Goyal VD, Sullivan BJ, Magliery TJ. Phylogenetic spread of sequence data affects fitness of consensus enzymes: Insights from triosephosphate isomerase. Proteins 2019; 88:274-283. [PMID: 31407418 DOI: 10.1002/prot.25799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 07/26/2019] [Accepted: 08/08/2019] [Indexed: 11/08/2022]
Abstract
The concept of consensus in multiple sequence alignments (MSAs) has been used to design and engineer proteins previously with some success. However, consensus design implicitly assumes that all amino acid positions function independently, whereas in reality, the amino acids in a protein interact with each other and work cooperatively to produce the optimum structure required for its function. Correlation analysis is a tool that can capture the effect of such interactions. In a previously published study, we made consensus variants of the triosephosphate isomerase (TIM) protein using MSAs that included sequences form both prokaryotic and eukaryotic organisms. These variants were not completely native-like and were also surprisingly different from each other in terms of oligomeric state, structural dynamics, and activity. Extensive correlation analysis of the TIM database has revealed some clues about factors leading to the unusual behavior of the previously constructed consensus proteins. Among other things, we have found that the more ill-behaved consensus mutant had more broken correlations than the better-behaved consensus variant. Moreover, we report three correlation and phylogeny-based consensus variants of TIM. These variants were more native-like than the previous consensus mutants and considerably more stable than a wild-type TIM from a mesophilic organism. This study highlights the importance of choosing the appropriate diversity of MSA for consensus analysis and provides information that can be used to engineer stable enzymes.
Collapse
Affiliation(s)
- Venuka Durani Goyal
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio
| | - Brandon J Sullivan
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio.,Ohio State Biochemistry Program, The Ohio State University, Columbus, Ohio
| | - Thomas J Magliery
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio
| |
Collapse
|
16
|
Musil M, Stourac J, Bendl J, Brezovsky J, Prokop Z, Zendulka J, Martinek T, Bednar D, Damborsky J. FireProt: web server for automated design of thermostable proteins. Nucleic Acids Res 2019; 45:W393-W399. [PMID: 28449074 PMCID: PMC5570187 DOI: 10.1093/nar/gkx285] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Accepted: 04/11/2017] [Indexed: 01/07/2023] Open
Abstract
There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnological applications. A number of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purification, and characterization. Here, we present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.
Collapse
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jaroslav Bendl
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jaroslav Zendulka
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,Centre of Excellence IT4Innovations, Technical University Ostrava, Ostrava
| | - Tomas Martinek
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,Centre of Excellence IT4Innovations, Technical University Ostrava, Ostrava
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
17
|
Wodak SJ, Paci E, Dokholyan NV, Berezovsky IN, Horovitz A, Li J, Hilser VJ, Bahar I, Karanicolas J, Stock G, Hamm P, Stote RH, Eberhardt J, Chebaro Y, Dejaegere A, Cecchini M, Changeux JP, Bolhuis PG, Vreede J, Faccioli P, Orioli S, Ravasio R, Yan L, Brito C, Wyart M, Gkeka P, Rivalta I, Palermo G, McCammon JA, Panecka-Hofman J, Wade RC, Di Pizio A, Niv MY, Nussinov R, Tsai CJ, Jang H, Padhorny D, Kozakov D, McLeish T. Allostery in Its Many Disguises: From Theory to Applications. Structure 2019; 27:566-578. [PMID: 30744993 PMCID: PMC6688844 DOI: 10.1016/j.str.2019.01.003] [Citation(s) in RCA: 267] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 11/29/2018] [Accepted: 01/02/2019] [Indexed: 12/19/2022]
Abstract
Allosteric regulation plays an important role in many biological processes, such as signal transduction, transcriptional regulation, and metabolism. Allostery is rooted in the fundamental physical properties of macromolecular systems, but its underlying mechanisms are still poorly understood. A collection of contributions to a recent interdisciplinary CECAM (Center Européen de Calcul Atomique et Moléculaire) workshop is used here to provide an overview of the progress and remaining limitations in the understanding of the mechanistic foundations of allostery gained from computational and experimental analyses of real protein systems and model systems. The main conceptual frameworks instrumental in driving the field are discussed. We illustrate the role of these frameworks in illuminating molecular mechanisms and explaining cellular processes, and describe some of their promising practical applications in engineering molecular sensors and informing drug design efforts.
Collapse
Affiliation(s)
| | | | - Nikolay V Dokholyan
- Department of Biochemistry & Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Departments of Pharmacology and Biochemistry & Molecular Biology, Penn State Medical Center, Hershey, PA, USA
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), and Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Jing Li
- Departments of Biology and T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, USA
| | - Vincent J Hilser
- Departments of Biology and T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, USA
| | - Ivet Bahar
- School of Medicine, University of Pittsburgh, Pittsburgh, USA
| | | | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, Freiburg, Germany
| | - Peter Hamm
- Department of Chemistry, University of Zurich, Zurich, Switzerland
| | - Roland H Stote
- Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Illkirch, France
| | - Jerome Eberhardt
- Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Illkirch, France
| | - Yassmine Chebaro
- Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Illkirch, France
| | - Annick Dejaegere
- Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Illkirch, France
| | - Marco Cecchini
- Institut de Chimie de Strasbourg, UMR7177 CNRS & Université de Strasbourg, Strasbourg, France
| | | | - Peter G Bolhuis
- van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, Netherlands
| | - Jocelyne Vreede
- van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, Netherlands
| | - Pietro Faccioli
- Physics Department, Università di Trento and INFN-TIFPA, Trento, Italy
| | - Simone Orioli
- Physics Department, Università di Trento and INFN-TIFPA, Trento, Italy
| | - Riccardo Ravasio
- Institute of Physics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Le Yan
- Kavli Institute for Theoretical Physics, University of California, Santa Barbara, CA 93106, USA
| | - Carolina Brito
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS 91501-970, Brazil
| | - Matthieu Wyart
- Institute of Physics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Paraskevi Gkeka
- Structure Design and Informatics, Sanofi R&D, Chilly-Mazarin, France
| | - Ivan Rivalta
- École Normale Supérieure de Lyon, Université de Lyon, CNRS, Université Claude Bernard Lyon 1, Lyon, France
| | - Giulia Palermo
- Department of Chemistry and Biochemistry, University of California, San Diego, USA; Department of Bioengineering, University of California Riverside, CA 92507, USA
| | - J Andrew McCammon
- Department of Chemistry and Biochemistry, University of California, San Diego, USA
| | - Joanna Panecka-Hofman
- Division of Biophysics, Institute of Experimental Physics, Faculty of Physics, University of Warsaw, Warsaw, Poland
| | - Rebecca C Wade
- Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS) and Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany; Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
| | - Antonella Di Pizio
- Leibniz-Institute for Food Systems Biology, Technical University of Munich, Munich, Germany
| | - Masha Y Niv
- Institute of Biochemistry, Food Science and Nutrition, Robert H Smith Faculty of Agriculture Food and Environment, The Hebrew University, Jerusalem, Israel
| | - Ruth Nussinov
- Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, USA; Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Chung-Jung Tsai
- Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, USA
| | - Hyunbum Jang
- Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, USA
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Tom McLeish
- Department of Physics, University of York, York, UK
| |
Collapse
|
18
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
19
|
Abstract
Serine proteinase inhibitors (serpins), typically fold to a metastable native state and undergo a major conformational change in order to inhibit target proteases. However, conformational lability of the native serpin fold renders them susceptible to misfolding and aggregation, and underlies misfolding diseases such as α1-antitrypsin deficiency. Serpin specificity towards its protease target is dictated by its flexible and solvent exposed reactive centre loop (RCL), which forms the initial interaction with the target protease during inhibition. Previous studies have attempted to alter the specificity by mutating the RCL to that of a target serpin, but the rules governing specificity are not understood well enough yet to enable specificity to be engineered at will. In this paper, we use conserpin, a synthetic, thermostable serpin, as a model protein with which to investigate the determinants of serpin specificity by engineering its RCL. Replacing the RCL sequence with that from α1-antitrypsin fails to restore specificity against trypsin or human neutrophil elastase. Structural determination of the RCL-engineered conserpin and molecular dynamics simulations indicate that, although the RCL sequence may partially dictate specificity, local electrostatics and RCL dynamics may dictate the rate of insertion during protease inhibition, and thus whether it behaves as an inhibitor or a substrate. Engineering serpin specificity is therefore substantially more complex than solely manipulating the RCL sequence, and will require a more thorough understanding of how conformational dynamics achieves the delicate balance between stability, folding and function required by the exquisite serpin mechanism of action.
Collapse
|
20
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
21
|
Zhou PY, Sze-To A, Wong AKC. Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics. BMC Med Genomics 2018; 11:103. [PMID: 30453949 PMCID: PMC6245498 DOI: 10.1186/s12920-018-0417-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. Methods To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. Results Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. Conclusions E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.
Collapse
Affiliation(s)
- Pei-Yuan Zhou
- Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Antonio Sze-To
- Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Andrew K C Wong
- Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada.
| |
Collapse
|
22
|
Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Comput Biol 2018; 14:e1006409. [PMID: 30192744 PMCID: PMC6145588 DOI: 10.1371/journal.pcbi.1006409] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/19/2018] [Accepted: 07/31/2018] [Indexed: 01/09/2023] Open
Abstract
Mutational correlation patterns found in population-level sequence data for the Human Immunodeficiency Virus (HIV) and the Hepatitis C Virus (HCV) have been demonstrated to be informative of viral fitness. Such patterns can be seen as footprints of the intrinsic functional constraints placed on viral evolution under diverse selective pressures. Here, considering multiple HIV and HCV proteins, we demonstrate that these mutational correlations encode a modular co-evolutionary structure that is tightly linked to the structural and functional properties of the respective proteins. Specifically, by introducing a robust statistical method based on sparse principal component analysis, we identify near-disjoint sets of collectively-correlated residues (sectors) having mostly a one-to-one association to largely distinct structural or functional domains. This suggests that the distinct phenotypic properties of HIV/HCV proteins often give rise to quasi-independent modes of evolution, with each mode involving a sparse and localized network of mutational interactions. Moreover, individual inferred sectors of HIV are shown to carry immunological significance, providing insight for guiding targeted vaccine strategies.
Collapse
|
23
|
Mao W, Wang T, Zhang W, Gong H. Identification of residue pairing in interacting β-strands from a predicted residue contact map. BMC Bioinformatics 2018; 19:146. [PMID: 29673311 PMCID: PMC5907701 DOI: 10.1186/s12859-018-2150-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 04/09/2018] [Indexed: 12/04/2022] Open
Abstract
Background Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map. Results Our algorithm RDb2C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb2C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb2C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb2C. Conclusion Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins. Availability All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C. Electronic supplementary material The online version of this article (10.1186/s12859-018-2150-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
| | - Tong Wang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
| | - Wenxuan Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China.
| |
Collapse
|
24
|
Zhou PY, Lee ESA, Sze-To A, Wong AKC. Revealing Subtle Functional Subgroups in Class A Scavenger Receptors by Pattern Discovery and Disentanglement of Aligned Pattern Clusters. Proteomes 2018; 6:proteomes6010010. [PMID: 29419792 PMCID: PMC5874769 DOI: 10.3390/proteomes6010010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 02/01/2018] [Accepted: 02/01/2018] [Indexed: 11/16/2022] Open
Abstract
A protein family has similar and diverse functions locally conserved as aligned sequence segments. Further discovering their association patterns could reveal subtle family subgroup characteristics. Since aligned residues associations (ARAs) in Aligned Pattern Clusters (APCs) are complex and intertwined due to entangled function, factors, and variance in the source environment, we have recently developed a novel method: Aligned Residue Association Discovery and Disentanglement (ARADD) to solve this problem. ARADD first obtains from an APC an ARA Frequency Matrix and converts it to an adjusted statistical residual vectorspace (SRV). It then disentangles the SRV into Principal Components (PCs) and Re-projects their vectors to a SRV to reveal succinct orthogonal AR groups. In this study, we applied ARADD to class A scavenger receptors (SR-A), a subclass of a diverse protein family binding to modified lipoproteins with diverse biological functionalities not explicitly known. Our experimental results demonstrated that ARADD can unveil subtle subgroups in sequence segments with diverse functionality and highly variable sequence lengths. We also demonstrated that the ARAs captured in a Position Weight Matrix or an APC were entangled in biological function and domain location but disentangled by ARADD to reveal different subclasses without knowing their actual occurrence positions.
Collapse
Affiliation(s)
- Pei-Yuan Zhou
- VaryWave Technology Co., Ltd., 538A, Core Building 2, Hong Kong Science Park, Shatin, NT, Hong Kong.
| | - En-Shiun Annie Lee
- VerticalScope Inc., 111 Peter Street, Suite 900, Toronto, ON M5V 2H1, Canada.
| | - Antonio Sze-To
- Systems Design Engineering, 5th, 6th Floor, 200 University Avenue West, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Andrew K C Wong
- Systems Design Engineering, 5th, 6th Floor, 200 University Avenue West, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| |
Collapse
|
25
|
Computational and Experimental Approaches to Predict Host-Parasite Protein-Protein Interactions. Methods Mol Biol 2018; 1819:153-173. [PMID: 30421403 DOI: 10.1007/978-1-4939-8618-7_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In host-parasite systems, protein-protein interactions are key to allow the pathogen to enter the host and persist within the host. The study of host-parasite molecular communication improves the understanding the mechanisms of infection, evasion of the host immune system and tropism across different tissues. Current trends in parasitology focus on unraveling host-parasite protein-protein interactions to aid the development of new strategies to combat pathogenic parasites with better treatments and prevention mechanisms. Due to the complexity of capturing experimentally these interactions, computational approaches integrating data from different sources (mainly "omics" data) become key to complement or support experimental approaches. Here, we focus on the application of experimental and computational methods in the prediction of host-parasite interactions and highlight the potential of each of these methods in specific contexts.
Collapse
|
26
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
27
|
Rawi R, Mall R, Kunji K, El Anbari M, Aupetit M, Ullah E, Bensmail H. COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator. BMC Bioinformatics 2016; 17:533. [PMID: 27978812 PMCID: PMC5159955 DOI: 10.1186/s12859-016-1400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/01/2016] [Indexed: 11/13/2022] Open
Abstract
Background The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. Results Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. Conclusion We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Reda Rawi
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.
| | - Raghvendra Mall
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Khalid Kunji
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Mohammed El Anbari
- Division of Biomedical Informatics, Sidra Medical and Research Center, Doha, Qatar
| | - Michael Aupetit
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ehsan Ullah
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
28
|
Schueler-Furman O, Wodak SJ. Computational approaches to investigating allostery. Curr Opin Struct Biol 2016; 41:159-171. [PMID: 27607077 DOI: 10.1016/j.sbi.2016.06.017] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Accepted: 06/23/2016] [Indexed: 01/01/2023]
Abstract
Allosteric regulation plays a key role in many biological processes, such as signal transduction, transcriptional regulation, and many more. It is rooted in fundamental thermodynamic and dynamic properties of macromolecular systems that are still poorly understood and are moreover modulated by the cellular context. Here we review the computational approaches used in the investigation of allosteric processes in protein systems. We outline how the models of allostery have evolved from their initial formulation in the sixties to the current views, which more fully account for the roles of the thermodynamic and dynamic properties of the system. We then describe the major classes of computational approaches employed to elucidate the mechanisms of allostery, the insights they have provided, as well as their limitations. We complement this analysis by highlighting the role of computational approaches in promising practical applications, such as the engineering of regulatory modules and identifying allosteric binding sites.
Collapse
Affiliation(s)
- Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada (IMRIC), Hebrew University, Hadassah Medical School, POB 12272, Jerusalem 91120, Israel
| | - Shoshana J Wodak
- VIB Structural Biology Research Center, VUB, Pleinlaan 2, 1050 Brussels, Belgium.
| |
Collapse
|
29
|
O'Rourke KF, Gorman SD, Boehr DD. Biophysical and computational methods to analyze amino acid interaction networks in proteins. Comput Struct Biotechnol J 2016; 14:245-51. [PMID: 27441044 PMCID: PMC4939391 DOI: 10.1016/j.csbj.2016.06.002] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 06/04/2016] [Accepted: 06/13/2016] [Indexed: 12/20/2022] Open
Abstract
Globular proteins are held together by interacting networks of amino acid residues. A number of different structural and computational methods have been developed to interrogate these amino acid networks. In this review, we describe some of these methods, including analyses of X-ray crystallographic data and structures, computer simulations, NMR data, and covariation among protein sequences, and indicate the critical insights that such methods provide into protein function. This information can be leveraged towards the design of new allosteric drugs, and the engineering of new protein function and protein regulation strategies.
Collapse
Affiliation(s)
- Kathleen F O'Rourke
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - Scott D Gorman
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - David D Boehr
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
30
|
Bendl J, Stourac J, Sebestova E, Vavra O, Musil M, Brezovsky J, Damborsky J. HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering. Nucleic Acids Res 2016; 44:W479-87. [PMID: 27174934 PMCID: PMC4987947 DOI: 10.1093/nar/gkw416] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 05/03/2016] [Indexed: 01/13/2023] Open
Abstract
HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins' stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calculations are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries. It is freely available at http://loschmidt.chemi.muni.cz/hotspotwizard.
Collapse
Affiliation(s)
- Jaroslav Bendl
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic
| | - Eva Sebestova
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic
| | - Ondrej Vavra
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
31
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 176] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
32
|
Striegel DA, Wojtowicz D, Przytycka TM, Periwal V. Correlated rigid modes in protein families. Phys Biol 2016; 13:025003. [PMID: 27063781 DOI: 10.1088/1478-3975/13/2/025003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A great deal of evolutionarily conserved information is contained in genomes and proteins. Enormous effort has been put into understanding protein structure and developing computational tools for protein folding, and many sophisticated approaches take structure and sequence homology into account. Several groups have applied statistical physics approaches to extracting information about proteins from sequences alone. Here, we develop a new method for sequence analysis based on first principles, in information theory, in statistical physics and in Bayesian analysis. We provide a complete derivation of our approach and we apply it to a variety of systems, to demonstrate its utility and its limitations. We show in some examples that phylogenetic alignments of amino-acid sequences of families of proteins imply the existence of a small number of modes that appear to be associated with correlated global variation. These modes are uncovered efficiently in our approach by computing a non-perturbative effective potential directly from the alignment. We show that this effective potential approaches a limiting form inversely with the logarithm of the number of sequences. Mapping symbol entropy flows along modes to underlying physical structures shows that these modes arise due to correlated compensatory adjustments. In the protein examples, these occur around functional binding pockets.
Collapse
|
33
|
Zhang H, Gao Y, Deng M, Wang C, Zhu J, Li SC, Zheng WM, Bu D. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix. Biochem Biophys Res Commun 2016; 472:217-22. [PMID: 26920058 DOI: 10.1016/j.bbrc.2016.01.188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 01/30/2016] [Indexed: 10/22/2022]
Abstract
Strategies for correlation analysis in protein contact prediction often encounter two challenges, namely, the indirect coupling among residues, and the background correlations mainly caused by phylogenetic biases. While various studies have been conducted on how to disentangle indirect coupling, the removal of background correlations still remains unresolved. Here, we present an approach for removing background correlations via low-rank and sparse decomposition (LRS) of a residue correlation matrix. The correlation matrix can be constructed using either local inference strategies (e.g., mutual information, or MI) or global inference strategies (e.g., direct coupling analysis, or DCA). In our approach, a correlation matrix was decomposed into two components, i.e., a low-rank component representing background correlations, and a sparse component representing true correlations. Finally the residue contacts were inferred from the sparse component of correlation matrix. We trained our LRS-based method on the PSICOV dataset, and tested it on both GREMLIN and CASP11 datasets. Our experimental results suggested that LRS significantly improves the contact prediction precision. For example, when equipped with the LRS technique, the prediction precision of MI and mfDCA increased from 0.25 to 0.67 and from 0.58 to 0.70, respectively (Top L/10 predicted contacts, sequence separation: 5 AA, dataset: GREMLIN). In addition, our LRS technique also consistently outperforms the popular denoising technique APC (average product correction), on both local (MI_LRS: 0.67 vs MI_APC: 0.34) and global measures (mfDCA_LRS: 0.70 vs mfDCA_APC: 0.67). Interestingly, we found out that when equipped with our LRS technique, local inference strategies performed in a comparable manner to that of global inference strategies, implying that the application of LRS technique narrowed down the performance gap between local and global inference strategies. Overall, our LRS technique greatly facilitates protein contact prediction by removing background correlations. An implementation of the approach called COLORS (improving COntact prediction using LOw-Rank and Sparse matrix decomposition) is available from http://protein.ict.ac.cn/COLORS/.
Collapse
Affiliation(s)
- Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yujuan Gao
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing, China; School of Mathematical Sciences, Peking University, Beijing, China; Center for Statistical Sciences, Peking University, Beijing, China
| | - Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China.
| |
Collapse
|
34
|
Abstract
Allosteric transition, defined as conformational changes induced by ligand binding, is one of the fundamental properties of proteins. Allostery has been observed and characterized in many proteins, and has been recently utilized to control protein function via regulation of protein activity. Here, we review the physical and evolutionary origin of protein allostery, as well as its importance to protein regulation, drug discovery, and biological processes in living systems. We describe recently developed approaches to identify allosteric pathways, connected sets of pairwise interactions that are responsible for propagation of conformational change from the ligand-binding site to a distal functional site. We then present experimental and computational protein engineering approaches for control of protein function by modulation of allosteric sites. As an example of application of these approaches, we describe a synergistic computational and experimental approach to rescue the cystic-fibrosis-associated protein cystic fibrosis transmembrane conductance regulator, which upon deletion of a single residue misfolds and causes disease. This example demonstrates the power of allosteric manipulation in proteins to both elucidate mechanisms of molecular function and to develop therapeutic strategies that rescue those functions. Allosteric control of proteins provides a tool to shine a light on the complex cascades of cellular processes and facilitate unprecedented interrogation of biological systems.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
35
|
Natural HCV variants with increased replicative fitness due to NS3 helicase mutations in the C-terminal helix α18. Sci Rep 2016; 6:19526. [PMID: 26787124 PMCID: PMC4726148 DOI: 10.1038/srep19526] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 12/14/2015] [Indexed: 12/12/2022] Open
Abstract
High replicative fitness is a general determinant of a multidrug resistance phenotype and may explain lower sensitivity to direct-acting antiviral agents (DAAs) in some hepatitis C virus genotypes. Genetic diversity in the molecular target site of peptidomimetic NS3 protease inhibitors could impact variant replicative fitness and potentially add to virologic treatment failure. We selected NS3 helicase residues near the protease natural substrate in the NS3 domain interface and identified natural variants from a public database. Sequence diversity among different genotypes was identified and subsequently analyzed for potential effects of helicase variants on protein structure and function, and phenotypic effects on RNA replication and DAA resistance. We found increased replicative fitness in particular for amino acid substitutions at the NS3 helicase C-terminal helix α18. A network of strongly coupled residue pairs is identified. Helix α18 is part of this regulatory network and connects several NS3 functional elements involved in RNA replication. Among all genotypes we found distinct sequence diversity at helix α18 in particular for the most difficult-to-treat genotype 3. Our data suggest sequence diversity with implications for virus replicative fitness due to natural variants in helicase helix α18.
Collapse
|
36
|
Abstract
Chaperonins are nanomachines that facilitate protein folding by undergoing energy (ATP)-dependent movements that are coordinated in time and space owing to complex allosteric regulation. They consist of two back-to-back stacked oligomeric rings with a cavity at each end where protein substrate folding can take place. Here, we focus on the GroEL/GroES chaperonin system from Escherichia coli and, to a lesser extent, on the more poorly characterized eukaryotic chaperonin CCT/TRiC. We describe their various functional (allosteric) states and how they are affected by substrates and allosteric effectors that include ATP, ADP, nonfolded protein substrates, potassium ions, and GroES (in the case of GroEL). We also discuss the pathways of intra- and inter-ring allosteric communication by which they interconvert and the coupling between allosteric transitions and protein folding reactions.
Collapse
Affiliation(s)
- Ranit Gruber
- Department of Structural Biology, Weizmann Institute of Science , Rehovot 76100, Israel
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science , Rehovot 76100, Israel
| |
Collapse
|
37
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
38
|
Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015; 83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]
Abstract
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.
Collapse
Affiliation(s)
- Daniel J Parente
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| | - J Christian J Ray
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, 66047
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| |
Collapse
|
39
|
Jacob E, Unger R, Horovitz A. Codon-level information improves predictions of inter-residue contacts in proteins by correlated mutation analysis. eLife 2015; 4:e08932. [PMID: 26371555 PMCID: PMC4602084 DOI: 10.7554/elife.08932] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 09/13/2015] [Indexed: 12/11/2022] Open
Abstract
Methods for analysing correlated mutations in proteins are becoming an increasingly powerful tool for predicting contacts within and between proteins. Nevertheless, limitations remain due to the requirement for large multiple sequence alignments (MSA) and the fact that, in general, only the relatively small number of top-ranking predictions are reliable. To date, methods for analysing correlated mutations have relied exclusively on amino acid MSAs as inputs. Here, we describe a new approach for analysing correlated mutations that is based on combined analysis of amino acid and codon MSAs. We show that a direct contact is more likely to be present when the correlation between the positions is strong at the amino acid level but weak at the codon level. The performance of different methods for analysing correlated mutations in predicting contacts is shown to be enhanced significantly when amino acid and codon data are combined.
Collapse
Affiliation(s)
- Etai Jacob
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
40
|
Xiong L, Liu Z. Molecular dynamics study on folding and allostery in RfaH. Proteins 2015; 83:1582-92. [DOI: 10.1002/prot.24839] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Revised: 05/18/2015] [Accepted: 05/22/2015] [Indexed: 12/18/2022]
Affiliation(s)
- Liqin Xiong
- Department of Physics; Beijing Normal University; Beijing 100875 China
| | - Zhenxing Liu
- Department of Physics; Beijing Normal University; Beijing 100875 China
| |
Collapse
|
41
|
Meshram RJ, Gacche RN. Effective epitope identification employing phylogenetic, mutational variability, sequence entropy, and correlated mutation analysis targeting NS5B protein of hepatitis C virus: from bioinformatics to therapeutics. J Mol Recognit 2015; 28:492-505. [PMID: 25727409 DOI: 10.1002/jmr.2466] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2014] [Revised: 11/21/2014] [Accepted: 01/16/2015] [Indexed: 12/13/2022]
Abstract
Hepatitis C virus (HCV) is considered as a foremost cause affecting numerous human liver-related disorders. An effective immuno-prophylactic measure (like stable vaccine) is still unavailable for HCV. We perform an in silico analysis of nonstructural protein 5B (NS5B) based CD4 and CD8 epitopes that might be implicated in improvement of treatment strategies for efficient vaccine development programs against HCV. Here, we report on effective utilization of knowledge obtained from multiple sequence alignment and phylogenetic analysis for investigation and evaluation of candidate epitopes that have enormous potential to be used in formulating proficient vaccine, embracing multiple strains prevalent among major geographical locations. Mutational variability data discussed herein focus on discriminating the region under active evolutionary pressure from those having lower mutational potential in existing experimentally verified epitopes, thus, providing a concrete framework for designing an effective peptide-based vaccine against HCV. Additionally, we measured entropy distribution in NS5B residues and pinpoint the positions in epitopes that are more susceptible to mutations and, thus, account for virus strategy to evade the host immune system. Findings from this study are expected to add more details on the sequence and structural aspects of NS5B protein, ultimately facilitating our understanding about the pathophysiology of HCV and assisting advance studies on the function of NS5B antigen on the epitope level. We also report on the mutational crosstalk between functionally important coevolving residues, using correlated mutation analysis, and identify networks of coupled mutations that represent pathways of allosteric communication inside and among NS5B thumb, finger, and palm domains.
Collapse
|
42
|
Mao W, Kaya C, Dutta A, Horovitz A, Bahar I. Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution. Bioinformatics 2015; 31:1929-37. [PMID: 25697822 PMCID: PMC4481699 DOI: 10.1093/bioinformatics/btv103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2014] [Accepted: 02/02/2015] [Indexed: 01/02/2023] Open
Abstract
Motivation: With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Results: Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Availability and implementation: Software is freely available through the Evol component of ProDy API. Contact:bahar@pitt.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenzhi Mao
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Cihan Kaya
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Anindita Dutta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Amnon Horovitz
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
43
|
Ruvinsky AM, Vakser IA, Rivera M. Local packing modulates diversity of iron pathways and cooperative behavior in eukaryotic and prokaryotic ferritins. J Chem Phys 2014; 140:115104. [PMID: 24655206 DOI: 10.1063/1.4868229] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Ferritin-like molecules show a remarkable combination of the evolutionary conserved activity of iron uptake and release that engage different pores in the conserved ferritin shell. It was hypothesized that pore selection and iron traffic depend on dynamic allostery with no conformational changes in the backbone. In this study, we detect the allosteric networks in Pseudomonas aeruginosa bacterioferritin (BfrB), bacterial ferritin (FtnA), and bullfrog M and L ferritins (Ftns) by a network-weaving algorithm (NWA) that passes threads of an allosteric network through highly correlated residues using hierarchical clustering. The residue-residue correlations are calculated in the packing-on elastic network model that introduces atom packing into the common packing-off model. Applying NWA revealed that each of the molecules has an extended allosteric network mostly buried inside the ferritin shell. The structure of the networks is consistent with experimental observations of iron transport: The allosteric networks in BfrB and FtnA connect the ferroxidase center with the 4-fold pores and B-pores, leaving the 3-fold pores unengaged. In contrast, the allosteric network directly links the 3-fold pores with the 4-fold pores in M and L Ftns. The majority of the network residues are either on the inner surface or buried inside the subunit fold or at the subunit interfaces. We hypothesize that the ferritin structures evolved in a way to limit the influence of functionally unrelated events in the cytoplasm on the allosteric network to maintain stability of the translocation mechanisms. We showed that the residue-residue correlations and the resultant long-range cooperativity depend on the ferritin shell packing, which, in turn, depends on protein sequence composition. Switching from the packing-on to the packing-off model reduces correlations by 35%-38% so that no allosteric network can be found. The influence of the side-chain packing on the allosteric networks explains the diversity in mechanisms of iron traffic suggested by experimental approaches.
Collapse
Affiliation(s)
- Anatoly M Ruvinsky
- Infection Innovative Medicine, AstraZeneca R&D Boston, 35 Gatehouse Drive, Waltham, Massachusetts 02451, USA
| | - Ilya A Vakser
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas 66047, USA
| | - Mario Rivera
- Department of Chemistry, The University of Kansas, Lawrence, Kansas 66047, USA
| |
Collapse
|
44
|
Conservation weighting functions enable covariance analyses to detect functionally important amino acids. PLoS One 2014; 9:e107723. [PMID: 25379728 PMCID: PMC4224327 DOI: 10.1371/journal.pone.0107723] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 07/31/2014] [Indexed: 01/22/2023] Open
Abstract
The explosive growth in the number of protein sequences gives rise to the possibility of using the natural variation in sequences of homologous proteins to find residues that control different protein phenotypes. Because in many cases different phenotypes are each controlled by a group of residues, the mutations that separate one version of a phenotype from another will be correlated. Here we incorporate biological knowledge about protein phenotypes and their variability in the sequence alignment of interest into algorithms that detect correlated mutations, improving their ability to detect the residues that control those phenotypes. We demonstrate the power of this approach using simulations and recent experimental data. Applying these principles to the protein families encoded by Dscam and Protocadherin allows us to make testable predictions about the residues that dictate the specificity of molecular interactions.
Collapse
|
45
|
Junier I. Conserved patterns in bacterial genomes: a conundrum physically tailored by evolutionary tinkering. Comput Biol Chem 2014; 53 Pt A:125-33. [PMID: 25239779 DOI: 10.1016/j.compbiolchem.2014.08.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/17/2022]
Abstract
The proper functioning of bacteria is encoded in their genome at multiple levels or scales, each of which is constrained by specific physical forces. At the smallest spatial scales, interatomic forces dictate the folding and function of proteins and nucleic acids. On longer length scales, stochastic forces emerging from the thermal jiggling of proteins and RNAs impose strong constraints on the organization of genes along chromosomes, more particularly in the context of the building of nucleoprotein complexes and the operational mode of regulatory agents. At the cellular level, transcription, replication and cell division activities generate forces that act on both the internal structure and cellular location of chromosomes. The overall result is a complex multi-scale organization of genomes that reflects the evolutionary tinkering of bacteria. The goal of this review is to highlight avenues for deciphering this complexity by focusing on patterns that are conserved among evolutionarily distant bacteria. To this end, I discuss three different organizational scales: the protein structures, the chromosomal organization of genes and the global structure of chromosomes.
Collapse
Affiliation(s)
- Ivan Junier
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
46
|
A bioinformatics pipeline for the analyses of viral escape dynamics and host immune responses during an infection. BIOMED RESEARCH INTERNATIONAL 2014; 2014:264519. [PMID: 25013771 PMCID: PMC4072169 DOI: 10.1155/2014/264519] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 05/08/2014] [Indexed: 01/21/2023]
Abstract
Rapidly mutating viruses, such as hepatitis C virus (HCV) and HIV, have adopted evolutionary strategies that allow escape from the host immune response via genomic mutations. Recent advances in high-throughput sequencing are reshaping the field of immuno-virology of viral infections, as these allow fast and cheap generation of genomic data. However, due to the large volumes of data generated, a thorough understanding of the biological and immunological significance of such information is often difficult. This paper proposes a pipeline that allows visualization and statistical analysis of viral mutations that are associated with immune escape. Taking next generation sequencing data from longitudinal analysis of HCV viral genomes during a single HCV infection, along with antigen specific T-cell responses detected from the same subject, we demonstrate the applicability of these tools in the context of primary HCV infection. We provide a statistical and visual explanation of the relationship between cooccurring mutations on the viral genome and the parallel adaptive immune response against HCV.
Collapse
|
47
|
Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I. Evol and ProDy for bridging protein sequence evolution and structural dynamics. Bioinformatics 2014; 30:2681-3. [PMID: 24849577 DOI: 10.1093/bioinformatics/btu336] [Citation(s) in RCA: 157] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. AVAILABILITY AND IMPLEMENTATION ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/.
Collapse
Affiliation(s)
- Ahmet Bakan
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anindita Dutta
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Wenzhi Mao
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Ying Liu
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Chakra Chennubhotla
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Timothy R Lezon
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
48
|
Janda JO, Popal A, Bauer J, Busch M, Klocke M, Spitzer W, Keller J, Merkl R. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 2014; 15:118. [PMID: 24766829 PMCID: PMC4021312 DOI: 10.1186/1471-2105-15-118] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to. RESULTS To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de. CONCLUSIONS Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany.
| |
Collapse
|
49
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
50
|
Li Z, Huang Y, Ouyang Y, Jiao Y, Xing H, Liao L, Jiang S, Shao Y, Ma L. CorMut: an R/Bioconductor package for computing correlated mutations based on selection pressure. Bioinformatics 2014; 30:2073-5. [PMID: 24681904 DOI: 10.1093/bioinformatics/btu154] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
UNLABELLED Correlated mutations constitute a fundamental idea in evolutionary biology, and understanding correlated mutations will, in turn, facilitate an understanding of the genetic mechanisms governing evolution. CorMut is an R package designed to compute correlated mutations in the unit of codon or amino acid mutation. Three classical methods were incorporated, and the computation results can be represented as correlation mutation networks. CorMut also enables the comparison of correlated mutations between two different evolutionary conditions. AVAILABILITY AND IMPLEMENTATION CorMut is released under the GNU General Public License within bioconductor project, and freely available at http://bioconductor.org/packages/release/bioc/html/CorMut.html.
Collapse
Affiliation(s)
- Zhenpeng Li
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yang Huang
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yabo Ouyang
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yang Jiao
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Hui Xing
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Lingjie Liao
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Shibo Jiang
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yiming Shao
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Liying Ma
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| |
Collapse
|