401
|
Yamada KD, Tomii K, Katoh K. Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees. Bioinformatics 2016; 32:3246-3251. [PMID: 27378296 PMCID: PMC5079479 DOI: 10.1093/bioinformatics/btw412] [Citation(s) in RCA: 222] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Accepted: 06/20/2016] [Indexed: 11/26/2022] Open
Abstract
Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation:http://mafft.cbrc.jp/alignment/software/ Contact:katoh@ifrec.osaka-u.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kazunori D Yamada
- Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Kentaro Tomii
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Kazutaka Katoh
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan
| |
Collapse
|
402
|
Simkovic F, Thomas JMH, Keegan RM, Winn MD, Mayans O, Rigden DJ. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds. IUCRJ 2016; 3:259-70. [PMID: 27437113 PMCID: PMC4937781 DOI: 10.1107/s2052252516008113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/18/2016] [Indexed: 05/05/2023]
Abstract
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Jens M. H. Thomas
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Ronan M. Keegan
- Research Complex at Harwell, STFC Rutherford Appleton Laboratory, Didcot OX11 0FA, England
| | - Martyn D. Winn
- Science and Technology Facilities Council, Daresbury Laboratory, Warrington WA4 4AD, England
| | - Olga Mayans
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
403
|
Taylor WR. An algorithm to parse segment packing in predicted protein contact maps. Algorithms Mol Biol 2016; 11:17. [PMID: 27330543 PMCID: PMC4912788 DOI: 10.1186/s13015-016-0080-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 05/24/2016] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The analysis of correlation in alignments generates a matrix of predicted contacts between positions in the structure and while these can arise for many reasons, the simplest explanation is that the pair of residues are in contact in a three-dimensional structure and are affecting each others selection pressure. To analyse these data, A dynamic programming algorithm was developed for parsing secondary structure interactions in predicted contact maps. RESULTS The non-local nature of the constraints required an iterated approach (using a "frozen approximation") but with good starting definitions, a single pass was usually sufficient. The method was shown to be effective when applied to the transmembrane class of protein and error tolerant even when the signal becomes degraded. In the globular class of protein, where the extent of interactions are more limited and more complex, the algorithm still behaved well, classifying most of the important interactions correctly in both a small and a large test case. For the larger protein, this involved examples of the algorithm apportioning parts of a single large secondary structure element between two different interactions. CONCLUSIONS It is expected that the method will be useful as a pre-processor to coarse-grained modelling methods to extend the range of protein tertiary structure prediction to larger proteins or to data that is currently too 'noisy' to be used by current residue-based methods.
Collapse
|
404
|
Abriata LA, Bovigny C, Dal Peraro M. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server. BMC Bioinformatics 2016; 17:242. [PMID: 27315797 PMCID: PMC4912743 DOI: 10.1186/s12859-016-1124-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 11/21/2022] Open
Abstract
Background Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Results Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. Discussion We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. Conclusion We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1124-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.
| | - Christophe Bovigny
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.,Present address: Molecular Modeling Group, Swiss Institute of Bioinformatics, UNIL, Bâtiment Génopode, Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland
| |
Collapse
|
405
|
Sikosek T, Krobath H, Chan HS. Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches. PLoS Comput Biol 2016; 12:e1004960. [PMID: 27253392 PMCID: PMC4890782 DOI: 10.1371/journal.pcbi.1004960] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 05/04/2016] [Indexed: 11/18/2022] Open
Abstract
Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein’s folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit-chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches. The biological functions of globular proteins are intimately related to their folded structures and their associated conformational fluctuations. Evolution of new structures is an important avenue to new functions. Although many mutations do not change the folded state, experiments indicate that a single amino acid substitution can lead to a drastic change in the folded structure. The physics of this switch-like behavior remains to be elucidated. Here we develop a computational model for the relevant physical forces, showing that mutations can lead to new folds by passing through intermediate sequences where the old and new folds occur with varying probabilities. Our approach helps provide a general physical account of conformational switching in evolution and mutational effects on conformational dynamics.
Collapse
Affiliation(s)
- Tobias Sikosek
- Departments of Biochemistry and Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Heinrich Krobath
- Departments of Biochemistry and Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Hue Sun Chan
- Departments of Biochemistry and Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
406
|
Rivoire O, Reynolds KA, Ranganathan R. Evolution-Based Functional Decomposition of Proteins. PLoS Comput Biol 2016; 12:e1004817. [PMID: 27254668 PMCID: PMC4890866 DOI: 10.1371/journal.pcbi.1004817] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 02/15/2016] [Indexed: 11/24/2022] Open
Abstract
The essential biological properties of proteins—folding, biochemical activities, and the capacity to adapt—arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment—a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. Proteins display the ability to fold, to carry out complex biochemical reactions, and to be adaptive to changing conditions of selection—the essential characteristics contributing to organismal fitness. A major goal is to understand how these properties emerge from the global pattern of interactions between amino acid residues. Here, we describe the principles and implementation of the statistical coupling analysis (SCA), a method to reveal this pattern through analysis of coevolution between amino acids in an ensemble of homologous sequences. The basic result is a decomposition of protein structures into groups of contiguous amino acids called “sectors” which have been linked to conserved functional properties. This work provides conceptual and practical tools for sector analysis in any sufficiently well-represented protein family, and represents a necessary basis for broadly testing the concept of protein sectors.
Collapse
Affiliation(s)
- Olivier Rivoire
- Laboratoire Interdisciplinaire de Physique, CNRS and Université Grenoble Alpes, Grenoble, France
| | - Kimberly A. Reynolds
- The Green Center for Systems Biology, and Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Rama Ranganathan
- The Green Center for Systems Biology, and Departments of Biophysics and Pharmacology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|
407
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins 2016; 84 Suppl 1:4-14. [PMID: 27171127 DOI: 10.1002/prot.25064] [Citation(s) in RCA: 149] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 04/29/2016] [Accepted: 05/08/2016] [Indexed: 12/15/2022]
Abstract
Modeling of protein structure from amino acid sequence now plays a major role in structural biology. Here we report new developments and progress from the CASP11 community experiment, assessing the state of the art in structure modeling. Notable points include the following: (1) New methods for predicting three dimensional contacts resulted in a few spectacular template free models in this CASP, whereas models based on sequence homology to proteins with experimental structure continue to be the most accurate. (2) Refinement of initial protein models, primarily using molecular dynamics related approaches, has now advanced to the point where the best methods can consistently (though slightly) improve nearly all models. (3) The use of relatively sparse NMR constraints dramatically improves the accuracy of models, and another type of sparse data, chemical crosslinking, introduced in this CASP, also shows promise for producing better models. (4) A new emphasis on modeling protein complexes, in collaboration with CAPRI, has produced interesting results, but also shows the need for more focus on this area. (5) Methods for estimating the accuracy of models have advanced to the point where they are of considerable practical use. (6) A first assessment demonstrates that models can sometimes successfully address biological questions that motivate experimental structure determination. (7) There is continuing progress in accuracy of modeling regions of structure not directly available by comparative modeling, while there is marginal or no progress in some other areas. Proteins 2016; 84(Suppl 1):4-14. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland, 20850.
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, 95616
| | | | - Torsten Schwede
- Biozentrum & SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
408
|
Jiménez-Sánchez A. Coevolution of RAC Small GTPases and their Regulators GEF Proteins. Evol Bioinform Online 2016; 12:121-31. [PMID: 27226705 PMCID: PMC4872645 DOI: 10.4137/ebo.s38031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Revised: 03/31/2016] [Accepted: 04/03/2016] [Indexed: 01/16/2023] Open
Abstract
RAC proteins are small GTPases involved in important cellular processes in eukaryotes, and their deregulation may contribute to cancer. Activation of RAC proteins is regulated by DOCK and DBL protein families of guanine nucleotide exchange factors (GEFs). Although DOCK and DBL proteins act as GEFs on RAC proteins, DOCK and DBL family members are evolutionarily unrelated. To understand how DBL and DOCK families perform the same function on RAC proteins despite their unrelated primary structure, phylogenetic analyses of the RAC, DBL, and DOCK families were implemented, and interaction patterns that may suggest a coevolutionary process were searched. Interestingly, while RAC and DOCK proteins are very well conserved in humans and among eukaryotes, DBL proteins are highly divergent. Moreover, correlation analyses of the phylogenetic distances of RAC and GEF proteins and covariation analyses between residues in the interacting domains showed significant coevolution rates for both RAC–DOCK and RAC–DBL interactions.
Collapse
Affiliation(s)
- Alejandro Jiménez-Sánchez
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK.; Previously at Department of Biology, University of York, York, UK
| |
Collapse
|
409
|
Lee JY, Kinch LN, Borek DM, Wang J, Wang J, Urbatsch IL, Xie XS, Grishin NV, Cohen JC, Otwinowski Z, Hobbs HH, Rosenbaum DM. Crystal structure of the human sterol transporter ABCG5/ABCG8. Nature 2016; 533:561-4. [PMID: 27144356 PMCID: PMC4964963 DOI: 10.1038/nature17666] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 03/15/2016] [Indexed: 02/08/2023]
Abstract
ATP binding cassette (ABC) transporters play critical roles in maintaining sterol balance in higher eukaryotes. The ABCG5/ABCG8 heterodimer (G5G8) mediates excretion of neutral sterols in liver and intestines. Mutations disrupting G5G8 cause sitosterolaemia, a disorder characterized by sterol accumulation and premature atherosclerosis. Here we use crystallization in lipid bilayers to determine the X-ray structure of human G5G8 in a nucleotide-free state at 3.9 Å resolution, generating the first atomic model of an ABC sterol transporter. The structure reveals a new transmembrane fold that is present in a large and functionally diverse superfamily of ABC transporters. The transmembrane domains are coupled to the nucleotide-binding sites by networks of interactions that differ between the active and inactive ATPases, reflecting the catalytic asymmetry of the transporter. The G5G8 structure provides a mechanistic framework for understanding sterol transport and the disruptive effects of mutations causing sitosterolaemia.
Collapse
Affiliation(s)
- Jyh-Yeuan Lee
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA
| | - Lisa N Kinch
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Dominika M Borek
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Jin Wang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA
| | - Junmei Wang
- Cecil &Ida Green Center for Molecular, Computational and Systems Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Ina L Urbatsch
- Department of Cell Biology and Biochemistry, Texas Tech University Health Sciences Center, Lubbock, Texas 79430, USA
| | - Xiao-Song Xie
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA
| | - Nikolai V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Jonathan C Cohen
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA
| | - Zbyszek Otwinowski
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Helen H Hobbs
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Daniel M Rosenbaum
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| |
Collapse
|
410
|
Champeimont R, Laine E, Hu SW, Penin F, Carbone A. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins. Sci Rep 2016; 6:26401. [PMID: 27198619 PMCID: PMC4873791 DOI: 10.1038/srep26401] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 05/03/2016] [Indexed: 12/20/2022] Open
Abstract
A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.
Collapse
Affiliation(s)
- Raphaël Champeimont
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
| | - Elodie Laine
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
| | - Shuang-Wei Hu
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
| | - Francois Penin
- CNRS, UMR5086, Bases Moléculaires et Structurales des Systèmes Infectieux, Institut de Biologie et Chimie des Protéines, 7 Passage du Vercors, Cedex 07, F-69367 Lyon, France
- LABEX Ecofect, Université de Lyon, Lyon, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
- Institut Universitaire de France, 75005, Paris, France
| |
Collapse
|
411
|
Neuwald AF. Gleaning structural and functional information from correlations in protein multiple sequence alignments. Curr Opin Struct Biol 2016; 38:1-8. [PMID: 27179293 DOI: 10.1016/j.sbi.2016.04.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/28/2016] [Accepted: 04/29/2016] [Indexed: 10/24/2022]
Abstract
The availability of vast amounts of protein sequence data facilitates detection of subtle statistical correlations due to imposed structural and functional constraints. Recent breakthroughs using Direct Coupling Analysis (DCA) and related approaches have tapped into correlations believed to be due to compensatory mutations. This has yielded some remarkable results, including substantially improved prediction of protein intra- and inter-domain 3D contacts, of membrane and globular protein structures, of substrate binding sites, and of protein conformational heterogeneity. A complementary approach is Bayesian Partitioning with Pattern Selection (BPPS), which partitions related proteins into hierarchically-arranged subgroups based on correlated residue patterns. These correlated patterns are presumably due to structural and functional constraints associated with evolutionary divergence rather than to compensatory mutations. Hence joint application of DCA- and BPPS-based approaches should help sort out the structural and functional constraints contributing to sequence correlations.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, 801 West Baltimore St., BioPark II, Room 617, Baltimore, MD 21201, United States.
| |
Collapse
|
412
|
Wang S, Li W, Zhang R, Liu S, Xu J. CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res 2016; 44:W361-6. [PMID: 27112569 PMCID: PMC4987891 DOI: 10.1093/nar/gkw307] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/12/2016] [Indexed: 12/14/2022] Open
Abstract
CoinFold (http://raptorx2.uchicago.edu/ContactMap/) is a web server for protein contact prediction and contact-assisted de novo structure prediction. CoinFold predicts contacts by integrating joint multi-family evolutionary coupling (EC) analysis and supervised machine learning. This joint EC analysis is unique in that it not only uses residue coevolution information in the target protein family, but also that in the related families which may have divergent sequences but similar folds. The supervised learning further improves contact prediction accuracy by making use of sequence profile, contact (distance) potential and other information. Finally, this server predicts tertiary structure of a sequence by feeding its predicted contacts and secondary structure to the CNS suite. Tested on the CASP and CAMEO targets, this server shows significant advantages over existing ones of similar category in both contact and tertiary structure prediction.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Wei Li
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Renyu Zhang
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Shiwang Liu
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
413
|
Kinjo AR. A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions. Biophys Physicobiol 2016; 13:45-62. [PMID: 27924257 PMCID: PMC5042171 DOI: 10.2142/biophysico.13.0_45] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 03/18/2016] [Indexed: 12/01/2022] Open
Abstract
The multiple sequence alignment (MSA) of a protein family provides a wealth of information in terms of the conservation pattern of amino acid residues not only at each alignment site but also between distant sites. In order to statistically model the MSA incorporating both short-range and long-range correlations as well as insertions, I have derived a lattice gas model of the MSA based on the principle of maximum entropy. The partition function, obtained by the transfer matrix method with a mean-field approximation, accounts for all possible alignments with all possible sequences. The model parameters for short-range and long-range interactions were determined by a self-consistent condition and by a Gaussian approximation, respectively. Using this model with and without long-range interactions, I analyzed the globin and V-set domains by increasing the “temperature” and by “mutating” a site. The correlations between residue conservation and various measures of the system’s stability indicate that the long-range interactions make the conservation pattern more specific to the structure, and increasingly stabilize better conserved residues.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
414
|
Livi L, Maiorino E, Giuliani A, Rizzi A, Sadeghian A. A generative model for protein contact networks. J Biomol Struct Dyn 2016; 34:1441-54. [DOI: 10.1080/07391102.2015.1077736] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Lorenzo Livi
- Department of Computer Science, Ryerson University, 350 Victoria Street, Toronto, ON, M5B 2K3Canada
| | - Enrico Maiorino
- Department of Information Engineering, Electronics, and Telecommunications, SAPIENZA University of Rome, Via Eudossiana 18, 00184Rome, Italy
| | - Alessandro Giuliani
- Department of Environment and Health, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161Rome, Italy
| | - Antonello Rizzi
- Department of Information Engineering, Electronics, and Telecommunications, SAPIENZA University of Rome, Via Eudossiana 18, 00184Rome, Italy
| | - Alireza Sadeghian
- Department of Computer Science, Ryerson University, 350 Victoria Street, Toronto, ON, M5B 2K3Canada
| |
Collapse
|
415
|
Yang J, Jin QY, Zhang B, Shen HB. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter. ACTA ACUST UNITED AC 2016; 32:2435-43. [PMID: 27153618 DOI: 10.1093/bioinformatics/btw181] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 04/03/2016] [Indexed: 11/12/2022]
Abstract
MOTIVATION Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. RESULTS In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Qi-Yu Jin
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Biao Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
416
|
Schneider M, Belsom A, Rappsilber J, Brock O. Blind testing of cross-linking/mass spectrometry hybrid methods in CASP11. Proteins 2016; 84 Suppl 1:152-63. [PMID: 26945814 PMCID: PMC5042049 DOI: 10.1002/prot.25028] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 02/09/2016] [Accepted: 02/27/2016] [Indexed: 12/26/2022]
Abstract
Hybrid approaches combine computational methods with experimental data. The information contained in the experimental data can be leveraged to probe the structure of proteins otherwise elusive to computational methods. Compared with computational methods, the structures produced by hybrid methods exhibit some degree of experimental validation. In spite of these advantages, most hybrid methods have not yet been validated in blind tests, hampering their development. Here, we describe the first blind test of a specific cross-link based hybrid method in CASP. This blind test was coordinated by the CASP organizers and utilized a novel, high-density cross-linking/mass-spectrometry (CLMS) approach that is able to collect high-density CLMS data in a matter of days. This experimental protocol was developed in the Rappsilber laboratory. This approach exploits the chemistry of a highly reactive, photoactivatable cross-linker to produce an order of magnitude more cross-links than homobifunctional cross-linkers. The Rappsilber laboratory generated experimental CLMS data based on this protocol, submitted the data to the CASP organizers which then released this data to the CASP11 prediction groups in a separate, CLMS assisted modeling experiment. We did not observe a clear improvement of assisted models, presumably because the properties of the CLMS data-uncertainty in cross-link identification and residue-residue assignment, and uneven distribution over the protein-were largely unknown to the prediction groups and their approaches were not yet tailored to this kind of data. We also suggest modifications to the CLMS-CASP experiment and discuss the importance of rigorous blind testing in the development of hybrid methods. Proteins 2016; 84(Suppl 1):152-163. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Michael Schneider
- Robotics and Biology Laboratory, Technische Universität Berlin, 10587, Berlin, Germany
| | - Adam Belsom
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, United Kingdom
| | - Juri Rappsilber
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, United Kingdom. .,Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355, Berlin, Germany.
| | - Oliver Brock
- Robotics and Biology Laboratory, Technische Universität Berlin, 10587, Berlin, Germany.
| |
Collapse
|
417
|
Bywater RP. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data. PLoS One 2016; 11:e0150769. [PMID: 26963911 PMCID: PMC4786192 DOI: 10.1371/journal.pone.0150769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/18/2022] Open
Abstract
Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.
Collapse
|
418
|
Baker FN, Porollo A. CoeViz: a web-based tool for coevolution analysis of protein residues. BMC Bioinformatics 2016; 17:119. [PMID: 26956673 PMCID: PMC4782369 DOI: 10.1186/s12859-016-0975-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 03/01/2016] [Indexed: 11/30/2022] Open
Abstract
Background Proteins generally perform their function in a folded state. Residues forming an active site, whether it is a catalytic center or interaction interface, are frequently distant in a protein sequence. Hence, traditional sequence-based prediction methods focusing on a single residue (or a short window of residues) at a time may have difficulties in identifying and clustering the residues constituting a functional site, especially when a protein has multiple functions. Evolutionary information encoded in multiple sequence alignments is known to greatly improve sequence-based predictions. Identification of coevolving residues further advances the protein structure and function annotation by revealing cooperative pairs and higher order groupings of residues. Results We present a new web-based tool (CoeViz) that provides a versatile analysis and visualization of pairwise coevolution of amino acid residues. The tool computes three covariance metrics: mutual information, chi-square statistic, Pearson correlation, and one conservation metric: joint Shannon entropy. Implemented adjustments of covariance scores include phylogeny correction, corrections for sequence dissimilarity and alignment gaps, and the average product correction. Visualization of residue relationships is enhanced by hierarchical cluster trees, heat maps, circular diagrams, and the residue highlighting in protein sequence and 3D structure. Unlike other existing tools, CoeViz is not limited to analyzing conserved domains or protein families and can process long, unstructured and multi-domain proteins thousands of residues long. Two examples are provided to illustrate the use of the tool for identification of residues (1) involved in enzymatic function, (2) forming short linear functional motifs, and (3) constituting a structural domain. Conclusions CoeViz represents a practical resource for a quick sequence-based protein annotation for molecular biologists, e.g., for identifying putative functional clusters of residues and structural domains. CoeViz also can serve computational biologists as a resource of coevolution matrices, e.g., for developing machine learning-based prediction models. The presented tool is integrated in the POLYVIEW-2D server (http://polyview.cchmc.org/) and available from resulting pages of POLYVIEW-2D.
Collapse
Affiliation(s)
- Frazier N Baker
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA. .,Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| | - Aleksey Porollo
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA. .,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
419
|
Belsom A, Schneider M, Fischer L, Brock O, Rappsilber J. Serum Albumin Domain Structures in Human Blood Serum by Mass Spectrometry and Computational Biology. Mol Cell Proteomics 2016; 15:1105-16. [PMID: 26385339 PMCID: PMC4813692 DOI: 10.1074/mcp.m115.048504] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 09/16/2015] [Indexed: 01/12/2023] Open
Abstract
Chemical cross-linking combined with mass spectrometry has proven useful for studying protein-protein interactions and protein structure, however the low density of cross-link data has so far precluded its use in determining structures de novo. Cross-linking density has been typically limited by the chemical selectivity of the standard cross-linking reagents that are commonly used for protein cross-linking. We have implemented the use of a heterobifunctional cross-linking reagent, sulfosuccinimidyl 4,4'-azipentanoate (sulfo-SDA), combining a traditional sulfo-N-hydroxysuccinimide (sulfo-NHS) ester and a UV photoactivatable diazirine group. This diazirine yields a highly reactive and promiscuous carbene species, the net result being a greatly increased number of cross-links compared with homobifunctional, NHS-based cross-linkers. We present a novel methodology that combines the use of this high density photo-cross-linking data with conformational space search to investigate the structure of human serum albumin domains, from purified samples, and in its native environment, human blood serum. Our approach is able to determine human serum albumin domain structures with good accuracy: root-mean-square deviation to crystal structure are 2.8/5.6/2.9 Å (purified samples) and 4.5/5.9/4.8Å (serum samples) for domains A/B/C for the first selected structure; 2.5/4.9/2.9 Å (purified samples) and 3.5/5.2/3.8 Å (serum samples) for the best out of top five selected structures. Our proof-of-concept study on human serum albumin demonstrates initial potential of our approach for determining the structures of more proteins in the complex biological contexts in which they function and which they may require for correct folding. Data are available via ProteomeXchange with identifier PXD001692.
Collapse
Affiliation(s)
- Adam Belsom
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Michael Schneider
- §Robotics and Biology Laboratory, Technische Universität Berlin, 10587 Berlin, Germany
| | - Lutz Fischer
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Oliver Brock
- §Robotics and Biology Laboratory, Technische Universität Berlin, 10587 Berlin, Germany
| | - Juri Rappsilber
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom; ¶Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany.
| |
Collapse
|
420
|
Ovchinnikov S, Kim DE, Wang RYR, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 2016; 84 Suppl 1:67-75. [PMID: 26677056 PMCID: PMC5490371 DOI: 10.1002/prot.24974] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 11/27/2015] [Accepted: 12/12/2015] [Indexed: 12/19/2022]
Abstract
We describe CASP11 de novo blind structure predictions made using the Rosetta structure prediction methodology with both automatic and human assisted protocols. Model accuracy was generally improved using coevolution derived residue-residue contact information as restraints during Rosetta conformational sampling and refinement, particularly when the number of sequences in the family was more than three times the length of the protein. The highlight was the human assisted prediction of T0806, a large and topologically complex target with no homologs of known structure, which had unprecedented accuracy-<3.0 Å root-mean-square deviation (RMSD) from the crystal structure over 223 residues. For this target, we increased the amount of conformational sampling over our fully automated method by employing an iterative hybridization protocol. Our results clearly demonstrate, in a blind prediction scenario, that coevolution derived contacts can considerably increase the accuracy of template-free structure modeling. Proteins 2016; 84(Suppl 1):67-75. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - David E Kim
- Institute for Protein Design, University of Washington, Washington, Seattle 98195.,Howard Hughes Medical Institute, University of Washington, Washington, Seattle 98195
| | - Ray Yu-Ruei Wang
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - Yuan Liu
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - David Baker
- Department of Biochemistry, University of Washington, Washington, Seattle 98195. .,Institute for Protein Design, University of Washington, Washington, Seattle 98195. .,Howard Hughes Medical Institute, University of Washington, Washington, Seattle 98195.
| |
Collapse
|
421
|
Jeong CS, Kim D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinformatics 2016; 17:99. [PMID: 26911566 PMCID: PMC4765150 DOI: 10.1186/s12859-016-0948-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 11/10/2022] Open
Abstract
Background Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. Results In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. Conclusions The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
422
|
Zhang H, Gao Y, Deng M, Wang C, Zhu J, Li SC, Zheng WM, Bu D. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix. Biochem Biophys Res Commun 2016; 472:217-22. [PMID: 26920058 DOI: 10.1016/j.bbrc.2016.01.188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 01/30/2016] [Indexed: 10/22/2022]
Abstract
Strategies for correlation analysis in protein contact prediction often encounter two challenges, namely, the indirect coupling among residues, and the background correlations mainly caused by phylogenetic biases. While various studies have been conducted on how to disentangle indirect coupling, the removal of background correlations still remains unresolved. Here, we present an approach for removing background correlations via low-rank and sparse decomposition (LRS) of a residue correlation matrix. The correlation matrix can be constructed using either local inference strategies (e.g., mutual information, or MI) or global inference strategies (e.g., direct coupling analysis, or DCA). In our approach, a correlation matrix was decomposed into two components, i.e., a low-rank component representing background correlations, and a sparse component representing true correlations. Finally the residue contacts were inferred from the sparse component of correlation matrix. We trained our LRS-based method on the PSICOV dataset, and tested it on both GREMLIN and CASP11 datasets. Our experimental results suggested that LRS significantly improves the contact prediction precision. For example, when equipped with the LRS technique, the prediction precision of MI and mfDCA increased from 0.25 to 0.67 and from 0.58 to 0.70, respectively (Top L/10 predicted contacts, sequence separation: 5 AA, dataset: GREMLIN). In addition, our LRS technique also consistently outperforms the popular denoising technique APC (average product correction), on both local (MI_LRS: 0.67 vs MI_APC: 0.34) and global measures (mfDCA_LRS: 0.70 vs mfDCA_APC: 0.67). Interestingly, we found out that when equipped with our LRS technique, local inference strategies performed in a comparable manner to that of global inference strategies, implying that the application of LRS technique narrowed down the performance gap between local and global inference strategies. Overall, our LRS technique greatly facilitates protein contact prediction by removing background correlations. An implementation of the approach called COLORS (improving COntact prediction using LOw-Rank and Sparse matrix decomposition) is available from http://protein.ict.ac.cn/COLORS/.
Collapse
Affiliation(s)
- Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yujuan Gao
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing, China; School of Mathematical Sciences, Peking University, Beijing, China; Center for Statistical Sciences, Peking University, Beijing, China
| | - Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China.
| |
Collapse
|
423
|
Ferreira DC, van der Linden MG, de Oliveira LC, Onuchic JN, de Araújo AFP. Information and redundancy in the burial folding code of globular proteins within a wide range of shapes and sizes. Proteins 2016; 84:515-31. [PMID: 26815167 DOI: 10.1002/prot.24998] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 12/28/2015] [Accepted: 01/19/2016] [Indexed: 11/09/2022]
Abstract
Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations.
Collapse
Affiliation(s)
- Diogo C Ferreira
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Marx G van der Linden
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Leandro C de Oliveira
- Departamento de Física, IBILCE, Universidade Estadual Paulista - UNESP, São José do Rio Preto, SP, 15054-000, Brazil
| | - José N Onuchic
- Center for Theoretical Biological Physics and Departments of Physics and Astronomy, Chemistry and Biosciences Rice University, 6100 Main Street, Houston, Texas, 77005
| | - Antônio F Pereira de Araújo
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| |
Collapse
|
424
|
Noel JK, Morcos F, Onuchic JN. Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics. F1000Res 2016; 5. [PMID: 26918164 PMCID: PMC4755392 DOI: 10.12688/f1000research.7186.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/21/2016] [Indexed: 11/25/2022] Open
Abstract
Experimentally derived structural constraints have been crucial to the implementation of computational models of biomolecular dynamics. For example, not only does crystallography provide essential starting points for molecular simulations but also high-resolution structures permit for parameterization of simplified models. Since the energy landscapes for proteins and other biomolecules have been shown to be minimally frustrated and therefore funneled, these structure-based models have played a major role in understanding the mechanisms governing folding and many functions of these systems. Structural information, however, may be limited in many interesting cases. Recently, the statistical analysis of residue co-evolution in families of protein sequences has provided a complementary method of discovering residue-residue contact interactions involved in functional configurations. These functional configurations are often transient and difficult to capture experimentally. Thus, co-evolutionary information can be merged with that available for experimentally characterized low free-energy structures, in order to more fully capture the true underlying biomolecular energy landscape.
Collapse
Affiliation(s)
- Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA; Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jose N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| |
Collapse
|
425
|
Nicoludis JM, Vogt BE, Green AG, Schärfe CPI, Marks DS, Gaudet R. Antiparallel protocadherin homodimers use distinct affinity- and specificity-mediating regions in cadherin repeats 1-4. eLife 2016; 5:e18449. [PMID: 27472898 PMCID: PMC5001838 DOI: 10.7554/elife.18449] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 07/28/2016] [Indexed: 01/07/2023] Open
Abstract
Protocadherins (Pcdhs) are cell adhesion and signaling proteins used by neurons to develop and maintain neuronal networks, relying on trans homophilic interactions between their extracellular cadherin (EC) repeat domains. We present the structure of the antiparallel EC1-4 homodimer of human PcdhγB3, a member of the γ subfamily of clustered Pcdhs. Structure and sequence comparisons of α, β, and γ clustered Pcdh isoforms illustrate that subfamilies encode specificity in distinct ways through diversification of loop region structure and composition in EC2 and EC3, which contains isoform-specific conservation of primarily polar residues. In contrast, the EC1/EC4 interface comprises hydrophobic interactions that provide non-selective dimerization affinity. Using sequence coevolution analysis, we found evidence for a similar antiparallel EC1-4 interaction in non-clustered Pcdh families. We thus deduce that the EC1-4 antiparallel homodimer is a general interaction strategy that evolved before the divergence of these distinct protocadherin families.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, United States,Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Bennett E Vogt
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, United States
| | - Charlotta PI Schärfe
- Department of Systems Biology, Harvard Medical School, Boston, United States,Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States,
| |
Collapse
|
426
|
Sfriso P, Duran-Frigola M, Mosca R, Emperador A, Aloy P, Orozco M. Residues Coevolution Guides the Systematic Identification of Alternative Functional Conformations in Proteins. Structure 2016; 24:116-126. [DOI: 10.1016/j.str.2015.10.025] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Revised: 10/13/2015] [Accepted: 10/17/2015] [Indexed: 12/12/2022]
|
427
|
Abstract
In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA.
| |
Collapse
|
428
|
Sahoo A, Khare S, Devanarayanan S, Jain PC, Varadarajan R. Residue proximity information and protein model discrimination using saturation-suppressor mutagenesis. eLife 2015; 4. [PMID: 26716404 PMCID: PMC4758949 DOI: 10.7554/elife.09532] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 12/29/2015] [Indexed: 12/16/2022] Open
Abstract
Identification of residue-residue contacts from primary sequence can be used to guide protein structure prediction. Using Escherichia coli CcdB as the test case, we describe an experimental method termed saturation-suppressor mutagenesis to acquire residue contact information. In this methodology, for each of five inactive CcdB mutants, exhaustive screens for suppressors were performed. Proximal suppressors were accurately discriminated from distal suppressors based on their phenotypes when present as single mutants. Experimentally identified putative proximal pairs formed spatial constraints to recover >98% of native-like models of CcdB from a decoy dataset. Suppressor methodology was also applied to the integral membrane protein, diacylglycerol kinase A where the structures determined by X-ray crystallography and NMR were significantly different. Suppressor as well as sequence co-variation data clearly point to the X-ray structure being the functional one adopted in vivo. The methodology is applicable to any macromolecular system for which a convenient phenotypic assay exists. DOI:http://dx.doi.org/10.7554/eLife.09532.001 Common techniques to determine the three-dimensional structures of proteins can help researchers to understand these molecules’ activities, but are often time-consuming and do not work for all proteins. Proteins are made of chains of amino acids. When a protein chain folds, some of these amino acids interact with other amino acids and these contacts dictate the overall shape of the protein. This means that identifying the pairs of contacting amino acids could make it possible to predict the protein’s structure. Interactions between pairs of contacting amino acids tend to remain conserved throughout evolution, and if a mutation alters one of the amino acids in a pair then a 'compensatory' change often occurs to alter the second amino acid as well. Compensatory mutations can suggest that two amino acids are close to each other in the three-dimensional shape of a protein, but the computational methods used to identify such amino acid pairs can sometimes be inaccurate. In 2012, researchers generated mutants of a bacterial protein called CcdB with changes to single amino acids that caused the protein to fail to fold correctly. Now, Sahoo et al. – who include two of the researchers involved in the 2012 work – have developed an experimental method to identify contacting amino acids and use the CcdB protein as a test case. The approach involved searching for additional mutations that could restore the activity of five of the original mutant proteins when the proteins were produced in yeast cells. The rationale was that any secondary mutations that restored the activity must have corrected the folding defect caused by the original mutation. Sahoo et al. then predicted how close the amino acids affected by the secondary mutations were to the amino acids altered by the original mutations. This information was used to select reliable three-dimensional models of CcdB from a large set of possible structures that had been generated previously using computer models. Next, the technique was applied to a protein called diacylglycerol kinase A. The structure of this protein had previously been inferred using techniques such as X-ray crystallography and nuclear magnetic resonance, but there was a mismatch between the two methods. Sahoo et al. found that the amino acid contacts derived from their experimental method matched those found in the crystal structure, suggesting that the functional protein structure in living cells is similar to the crystal structure. In the future, the experimental approach developed in this work could be combined with existing methods to reliably guide protein structure prediction. DOI:http://dx.doi.org/10.7554/eLife.09532.002
Collapse
Affiliation(s)
- Anusmita Sahoo
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | - Pankaj C Jain
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India.,Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
429
|
Braun T, Koehler Leman J, Lange OF. Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction. PLoS Comput Biol 2015; 11:e1004661. [PMID: 26713437 PMCID: PMC4694711 DOI: 10.1371/journal.pcbi.1004661] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 11/17/2015] [Indexed: 12/18/2022] Open
Abstract
Recent work has shown that the accuracy of ab initio structure prediction can be significantly improved by integrating evolutionary information in form of intra-protein residue-residue contacts. Following this seminal result, much effort is put into the improvement of contact predictions. However, there is also a substantial need to develop structure prediction protocols tailored to the type of restraints gained by contact predictions. Here, we present a structure prediction protocol that combines evolutionary information with the resolution-adapted structural recombination approach of Rosetta, called RASREC. Compared to the classic Rosetta ab initio protocol, RASREC achieves improved sampling, better convergence and higher robustness against incorrect distance restraints, making it the ideal sampling strategy for the stated problem. To demonstrate the accuracy of our protocol, we tested the approach on a diverse set of 28 globular proteins. Our method is able to converge for 26 out of the 28 targets and improves the average TM-score of the entire benchmark set from 0.55 to 0.72 when compared to the top ranked models obtained by the EVFold web server using identical contact predictions. Using a smaller benchmark, we furthermore show that the prediction accuracy of our method is only slightly reduced when the contact prediction accuracy is comparatively low. This observation is of special interest for protein sequences that only have a limited number of homologs. Recently, a breakthrough has been achieved in modeling the atomic 3D structures of proteins from their sequence alone without requiring any experimental work on the protein itself. To achieve this goal, a database of evolutionary related sequences is analyzed to find co-evolving residues, giving insight into which residues are in close proximity to each other. These residue-residue contacts can help to drive a computer simulation with an atomic-scale physical model of the protein structure from a random starting conformation to a native-like 3D conformation. Although much effort is being put into the improvement of residue-residue contact predictions, their accuracy will always be limited. Therefore, structure prediction protocols with a high tolerance against incorrect distance restraints are needed. Here, we present a structure prediction protocol that combines evolutionary information with the iterative sampling approach of the molecular modeling suite Rosetta, called RASREC. RASREC has been shown to converge faster to near-native models and to be more robust against incorrect distance restraints than standard prediction protocols. It is therefore perfectly suited for restraints obtained from predicted residue-residue contacts with limited accuracy. We show that our protocol outperforms other currently published structure prediction methods and is able to achieve accurate structures, even if the accuracy of predicted contacts is low.
Collapse
Affiliation(s)
- Tatjana Braun
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
- * E-mail:
| | - Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Oliver F. Lange
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
| |
Collapse
|
430
|
Goldsmith M, Eckstein S, Ashani Y, Greisen P, Leader H, Sussman JL, Aggarwal N, Ovchinnikov S, Tawfik DS, Baker D, Thiermann H, Worek F. Catalytic efficiencies of directly evolved phosphotriesterase variants with structurally different organophosphorus compounds in vitro. Arch Toxicol 2015; 90:2711-2724. [DOI: 10.1007/s00204-015-1626-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/22/2015] [Indexed: 11/29/2022]
|
431
|
Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex. PLoS One 2015; 10:e0143245. [PMID: 26579711 PMCID: PMC4651434 DOI: 10.1371/journal.pone.0143245] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 11/02/2015] [Indexed: 11/19/2022] Open
Abstract
The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env.
Collapse
|
432
|
Kryshtafovych A, Moult J, Baslé A, Burgin A, Craig TK, Edwards RA, Fass D, Hartmann MD, Korycinski M, Lewis RJ, Lorimer D, Lupas AN, Newman J, Peat TS, Piepenbrink KH, Prahlad J, van Raaij MJ, Rohwer F, Segall AM, Seguritan V, Sundberg EJ, Singh AK, Wilson MA, Schwede T. Some of the most interesting CASP11 targets through the eyes of their authors. Proteins 2015; 84 Suppl 1:34-50. [PMID: 26473983 PMCID: PMC4834066 DOI: 10.1002/prot.24942] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 09/17/2015] [Accepted: 10/11/2015] [Indexed: 11/17/2022]
Abstract
The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34–50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - John Moult
- Department of Cell Biology and Molecular Genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, 20850
| | - Arnaud Baslé
- Institute for Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne, NE2 4HH, United Kingdom
| | - Alex Burgin
- Broad Institute, Cambridge, Massachusetts, 02142
| | | | - Robert A Edwards
- Department of Biology, San Diego State University, San Diego, California, 92182.,Department of Computer Science, San Diego State University, San Diego, California, 92182
| | - Deborah Fass
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Marcus D Hartmann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
| | - Mateusz Korycinski
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
| | - Richard J Lewis
- Institute for Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne, NE2 4HH, United Kingdom
| | | | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
| | - Janet Newman
- Biomedical Manufacturing Program, CSIRO, Parkville, VIC, Australia
| | - Thomas S Peat
- Biomedical Manufacturing Program, CSIRO, Parkville, VIC, Australia
| | - Kurt H Piepenbrink
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, 21201
| | - Janani Prahlad
- Department of Biochemistry and Redox Biology Center, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588
| | - Mark J van Raaij
- Centro Nactional De Biotecnologia (CNB-CSIC), Madrid, E-28049, Spain
| | - Forest Rohwer
- Department of Biology and Viral Information Institute, San Diego State University, San Diego, California, 92182
| | - Anca M Segall
- Department of Biology and Viral Information Institute, San Diego State University, San Diego, California, 92182
| | | | - Eric J Sundberg
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, 21201.,Department of Medicine, University of Maryland School of Medicine, Baltimore, Maryland, 21201.,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, 21201
| | - Abhimanyu K Singh
- School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom
| | - Mark A Wilson
- Department of Biochemistry and Redox Biology Center, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, 4056, Switzerland. .,SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland.
| |
Collapse
|
433
|
Nicoludis JM, Lau SY, Schärfe CPI, Marks DS, Weihofen WA, Gaudet R. Structure and Sequence Analyses of Clustered Protocadherins Reveal Antiparallel Interactions that Mediate Homophilic Specificity. Structure 2015; 23:2087-98. [PMID: 26481813 PMCID: PMC4635037 DOI: 10.1016/j.str.2015.09.005] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Revised: 09/14/2015] [Accepted: 09/15/2015] [Indexed: 01/07/2023]
Abstract
Clustered protocadherin (Pcdh) proteins mediate dendritic self-avoidance in neurons via specific homophilic interactions in their extracellular cadherin (EC) domains. We determined crystal structures of EC1-EC3, containing the homophilic specificity-determining region, of two mouse clustered Pcdh isoforms (PcdhγA1 and PcdhγC3) to investigate the nature of the homophilic interaction. Within the crystal lattices, we observe antiparallel interfaces consistent with a role in trans cell-cell contact. Antiparallel dimerization is supported by evolutionary correlations. Two interfaces, located primarily on EC2-EC3, involve distinctive clustered Pcdh structure and sequence motifs, lack predicted glycosylation sites, and contain residues highly conserved in orthologs but not paralogs, pointing toward their biological significance as homophilic interaction interfaces. These two interfaces are similar yet distinct, reflecting a possible difference in interaction architecture between clustered Pcdh subfamilies. These structures initiate a molecular understanding of clustered Pcdh assemblies that are required to produce functional neuronal networks.
Collapse
Affiliation(s)
- John M. Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA, 02138, USA
| | - Sze-Yi Lau
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA, 02138, USA
| | - Charlotta P. I. Schärfe
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA,Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Wilhelm A. Weihofen
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA, 02138, USA,Correspondence: (R. G.), (W. A.W.)
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA, 02138, USA,Correspondence: (R. G.), (W. A.W.)
| |
Collapse
|
434
|
Aakre CD, Herrou J, Phung TN, Perchuk BS, Crosson S, Laub MT. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 2015; 163:594-606. [PMID: 26478181 DOI: 10.1016/j.cell.2015.09.055] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Revised: 09/11/2015] [Accepted: 09/22/2015] [Indexed: 10/22/2022]
Abstract
Interacting proteins typically coevolve, and the identification of coevolving amino acids can pinpoint residues required for interaction specificity. This approach often assumes that an interface-disrupting mutation in one protein drives selection of a compensatory mutation in its partner during evolution. However, this model requires a non-functional intermediate state prior to the compensatory change. Alternatively, a mutation in one protein could first broaden its specificity, allowing changes in its partner, followed by a specificity-restricting mutation. Using bacterial toxin-antitoxin systems, we demonstrate the plausibility of this second, promiscuity-based model. By screening large libraries of interface mutants, we show that toxins and antitoxins with high specificity are frequently connected in sequence space to more promiscuous variants that can serve as intermediates during a reprogramming of interaction specificity. We propose that the abundance of promiscuous variants promotes the expansion and diversification of toxin-antitoxin systems and other paralogous protein families during evolution.
Collapse
Affiliation(s)
- Christopher D Aakre
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Julien Herrou
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA
| | - Tuyen N Phung
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Barrett S Perchuk
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Sean Crosson
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
435
|
Hou Q, Dutilh BE, Huynen MA, Heringa J, Feenstra KA. Sequence specificity between interacting and non-interacting homologs identifies interface residues--a homodimer and monomer use case. BMC Bioinformatics 2015; 16:325. [PMID: 26449222 PMCID: PMC4599308 DOI: 10.1186/s12859-015-0758-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 09/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. Results We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision. Conclusions To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0758-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands. .,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands. .,Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
436
|
De Leonardis E, Lutz B, Ratz S, Cocco S, Monasson R, Schug A, Weigt M. Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucleic Acids Res 2015; 43:10444-55. [PMID: 26420827 PMCID: PMC4666395 DOI: 10.1093/nar/gkv932] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/07/2015] [Indexed: 12/16/2022] Open
Abstract
Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.
Collapse
Affiliation(s)
- Eleonora De Leonardis
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Benjamin Lutz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Sebastian Ratz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Rémi Monasson
- Laboratoire de Physique Théorique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Martin Weigt
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France
| |
Collapse
|
437
|
Zhang W, Yang J, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11. Proteins 2015; 84 Suppl 1:76-86. [PMID: 26370505 DOI: 10.1002/prot.24930] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/26/2015] [Accepted: 09/10/2015] [Indexed: 11/12/2022]
Abstract
We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109. .,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|
438
|
Jacob E, Unger R, Horovitz A. Codon-level information improves predictions of inter-residue contacts in proteins by correlated mutation analysis. eLife 2015; 4:e08932. [PMID: 26371555 PMCID: PMC4602084 DOI: 10.7554/elife.08932] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 09/13/2015] [Indexed: 12/11/2022] Open
Abstract
Methods for analysing correlated mutations in proteins are becoming an increasingly powerful tool for predicting contacts within and between proteins. Nevertheless, limitations remain due to the requirement for large multiple sequence alignments (MSA) and the fact that, in general, only the relatively small number of top-ranking predictions are reliable. To date, methods for analysing correlated mutations have relied exclusively on amino acid MSAs as inputs. Here, we describe a new approach for analysing correlated mutations that is based on combined analysis of amino acid and codon MSAs. We show that a direct contact is more likely to be present when the correlation between the positions is strong at the amino acid level but weak at the codon level. The performance of different methods for analysing correlated mutations in predicting contacts is shown to be enhanced significantly when amino acid and codon data are combined.
Collapse
Affiliation(s)
- Etai Jacob
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
439
|
dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 2015; 5:13652. [PMID: 26338201 PMCID: PMC4559900 DOI: 10.1038/srep13652] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 07/13/2015] [Indexed: 11/09/2022] Open
Abstract
We develop a procedure to characterize the association of protein structures into homodimers using coevolutionary couplings extracted from Direct Coupling Analysis (DCA) in combination with Structure Based Models (SBM). Identification of dimerization contacts using DCA is more challenging than intradomain contacts since direct couplings are mixed with monomeric contacts. Therefore a systematic way to extract dimerization signals has been elusive. We provide evidence that the prediction of homodimeric complexes is possible with high accuracy for all the cases we studied which have rich sequence information. For the most accurate conformations of the structurally diverse dimeric complexes studied the mean and interfacial RMSDs are 1.95Å and 1.44Å, respectively. This methodology is also able to identify distinct dimerization conformations as for the case of the family of response regulators, which dimerize upon activation. The identification of dimeric complexes can provide interesting molecular insights in the construction of large oligomeric complexes and be useful in the study of aggregation related diseases like Alzheimer's or Parkinson's.
Collapse
Affiliation(s)
- Ricardo N. dos Santos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| | - Biman Jana
- Department of Physical Chemistry, Indian Association for the Cultivation of Science, Jadavpur, Kolkata-700032, India
| | - Adriano D. Andricopulo
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| |
Collapse
|
440
|
Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV, Baker D. Large-scale determination of previously unsolved protein structures using evolutionary information. eLife 2015; 4:e09248. [PMID: 26335199 PMCID: PMC4602095 DOI: 10.7554/elife.09248] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Accepted: 08/30/2015] [Indexed: 12/18/2022] Open
Abstract
The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, United States
| | - Lisa Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, United States
| | - Yuxing Liao
- Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
| | - David E Kim
- Department of Biochemistry, University of Washington, Seattle, United States
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
- Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, United States
- Howard Hughes Medical Institute, University of Washington, Seattle, United States
| |
Collapse
|
441
|
Alford RF, Koehler Leman J, Weitzner BD, Duran AM, Tilley DC, Elazar A, Gray JJ. An Integrated Framework Advancing Membrane Protein Modeling and Design. PLoS Comput Biol 2015; 11:e1004398. [PMID: 26325167 PMCID: PMC4556676 DOI: 10.1371/journal.pcbi.1004398] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 06/09/2015] [Indexed: 11/19/2022] Open
Abstract
Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design.
Collapse
Affiliation(s)
- Rebecca F. Alford
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Brian D. Weitzner
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Amanda M. Duran
- Center for Structural Biology, Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Drew C. Tilley
- Department of Physiology and Membrane Biology, University of California, Davis, Davis, California, United States of America
| | - Assaf Elazar
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
442
|
Determination of specificity influencing residues for key transcription factor families. QUANTITATIVE BIOLOGY 2015; 3:115-123. [PMID: 26753103 DOI: 10.1007/s40484-015-0045-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (Direct Information, PSICOV and adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs forhomeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at stormo.wustl.edu/SpecPred.
Collapse
|
443
|
Avila-Herrera A, Pollard KS. Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species. BMC Bioinformatics 2015; 16:268. [PMID: 26303588 PMCID: PMC4549020 DOI: 10.1186/s12859-015-0677-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/17/2015] [Indexed: 01/09/2023] Open
Abstract
Background When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure. Results Our quantitative benchmarking showed that all coevolutionary methods clearly benefit from alignments with many sequences. Methods that aim to detect direct correlations generally outperform other approaches. However, faster mutual information based methods are occasionally competitive in small alignments and with relaxed false positive rates. Two commonly used null distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments. Conclusions We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0677-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aram Avila-Herrera
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | - Katherine S Pollard
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA. .,Department of Epidemiology and Biostatistics, University of California, San Francisco, USA. .,Institute for Human Genetics, University of California, San Francisco, 94158, CA, USA.
| |
Collapse
|
444
|
Abstract
Here we present the results of residue-residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top-L/5 long-range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two-stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple-sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2016; 84(Suppl 1):145-151. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Tomasz Kosciolek
- Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
445
|
Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015; 31:3506-13. [PMID: 26275894 DOI: 10.1093/bioinformatics/btv472] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Protein contact prediction is important for protein structure and functional study. Both evolutionary coupling (EC) analysis and supervised machine learning methods have been developed, making use of different information sources. However, contact prediction is still challenging especially for proteins without a large number of sequence homologs. RESULTS This article presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from existing single-family EC analysis that uses residue coevolution information in only the target protein family, our joint EC analysis uses residue coevolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement this, we model a set of related protein families using Gaussian graphical models and then coestimate their parameters by maximum-likelihood, subject to the constraint that these parameters shall be similar to some degree. Our GGL method can also integrate supervised learning methods to further improve accuracy. Experiments show that our method outperforms existing methods on proteins without thousands of sequence homologs, and that our method performs better on both conserved and family-specific contacts. AVAILABILITY AND IMPLEMENTATION See http://raptorx.uchicago.edu/ContactMap/ for a web server implementing the method. CONTACT j3xu@ttic.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhu Ma
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| | - Sheng Wang
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| | - Zhiyong Wang
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| |
Collapse
|
446
|
Yang J, He BJ, Jang R, Zhang Y, Shen HB. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins. Bioinformatics 2015; 31:3773-81. [PMID: 26254435 DOI: 10.1093/bioinformatics/btv459] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 08/02/2015] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g., >3 bonds, is too low to effectively assist structure assembly simulations. RESULTS We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ CONTACT zhng@umich.edu or hbshen@sjtu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Bao-Ji He
- State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China, Department of Computational Medicine and Bioinformatics and
| | - Richard Jang
- Department of Computational Medicine and Bioinformatics and
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and
| |
Collapse
|
447
|
Stein RR, Marks DS, Sander C. Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models. PLoS Comput Biol 2015. [PMID: 26225866 PMCID: PMC4520494 DOI: 10.1371/journal.pcbi.1004182] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene–gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.
Collapse
Affiliation(s)
- Richard R. Stein
- Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail: (RRS); (CS)
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Chris Sander
- Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail: (RRS); (CS)
| |
Collapse
|
448
|
Abstract
In mammals and fungi, Nod-like receptors (NLR) activate downstream cell death execution proteins by a prion-like mechanism. In Podospora anserina, the NWD2 NLR activates the HET-S Helo-domain pore-forming protein by converting its prion-forming domain into a characteristic β-solenoid amyloid fold. The amyloid forming region of HET-S/s comprises two repetitions of a 21 amino acid motif. Herein, we systematically analyze the sequences of C-terminal regions of fungal HeLo and HeLo-like domain proteins to identify HET-s-related amyloid motifs (HRAM). We now identify four novel HRAM subfamilies in addition to the canonical HET-S/s subfamily. These novel motifs share the pseudo-repeat structure of HET-S/s and a specific pattern of distribution of hydrophobic and polar residues. Sequence co-variance analyses predict parallel in-register β-stacking of the two repeats and residue-residue interactions compatible with the β-solenoid fold. As described for HET-S, most genes encoding the HeLo proteins are adjacent to genes encoding NLRs also displaying HRAMs. The motifs of the NLRs are similar to those of their cognate HeLo-domain protein, indicating concerted evolution between repeats. This study shows that HET-s-related amyloid motifs are more common than anticipated and that they have diversified into discrete subfamilies that apparently share a common overall fold.
Collapse
|
449
|
Ahmad A, Cai Y, Chen X, Shuai J, Han A. Conformational Dynamics of Response Regulator RegX3 from Mycobacterium tuberculosis. PLoS One 2015. [PMID: 26201027 PMCID: PMC4511772 DOI: 10.1371/journal.pone.0133389] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Two-component signal transduction systems (TCS) are vital for adaptive responses to various environmental stresses in bacteria, fungi and even plants. A TCS typically comprises of a sensor histidine kinase (SK) with its cognate response regulator (RR), which often has two domains—N terminal receiver domain (RD) and C terminal effector domain (ED). The histidine kinase phosphorylates the RD to activate the ED by promoting dimerization. However, despite significant progress on structural studies, how RR transmits activation signal from RD to ED remains elusive. Here we analyzed active to inactive transition process of OmpR/PhoB family using an active conformation of RegX3 from Mycobacterium tuberculosis as a model system by computational approaches. An inactive state of RegX3 generated from 150 ns molecular dynamic simulation has rotameric conformations of Thr79 and Tyr98 that are generally conserved in inactive RRs. Arg81 in loop β4α4 acts synergistically with loop β1α1 to change its interaction partners during active to inactive transition, potentially leading to the N-terminal movement of RegX3 helix α1. Global conformational dynamics of RegX3 is mainly dependent on α4β5 region, in particular seven ‘hot-spot’ residues (Tyr98 to Ser104), adjacent to which several coevolved residues at dimeric interface, including Ile76-Asp96, Asp97-Arg111 and Glu24-Arg113 pairs, are critical for signal transduction. Taken together, our computational analyses suggest a molecular linkage between Asp phosphorylation, proximal loops and α4β5α5 dimeric interface during RR active to inactive state transition, which is not often evidently defined from static crystal structures.
Collapse
Affiliation(s)
- Ashfaq Ahmad
- State Key Laboratory for Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan, Xiamen, China
| | - Yongfei Cai
- State Key Laboratory for Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan, Xiamen, China
| | - Xingqiang Chen
- Department of Physics, Xiamen University, Siming, Xiamen, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Siming, Xiamen, China
| | - Aidong Han
- State Key Laboratory for Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan, Xiamen, China
| |
Collapse
|
450
|
Antala S, Ovchinnikov S, Kamisetty H, Baker D, Dempski RE. Computation and Functional Studies Provide a Model for the Structure of the Zinc Transporter hZIP4. J Biol Chem 2015; 290:17796-17805. [PMID: 25971965 PMCID: PMC4505028 DOI: 10.1074/jbc.m114.617613] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Revised: 04/24/2015] [Indexed: 11/06/2022] Open
Abstract
Members of the Zrt and Irt protein (ZIP) family are a central participant in transition metal homeostasis as they function to increase the cytosolic concentration of zinc and/or iron. However, the lack of a crystal structure hinders elucidation of the molecular mechanism of ZIP proteins. Here, we employed GREMLIN, a co-evolution-based contact prediction approach in conjunction with the Rosetta structure prediction program to construct a structural model of the human (h) ZIP4 transporter. The predicted contact data are best fit by modeling hZIP4 as a dimer. Mutagenesis of residues that comprise a central putative hZIP4 transmembrane transition metal coordination site in the structural model alter the kinetics and specificity of hZIP4. Comparison of the hZIP4 dimer model to all known membrane protein structures identifies the 12-transmembrane monomeric Piriformospora indica phosphate transporter (PiPT), a member of the major facilitator superfamily (MFS), as a likely structural homolog.
Collapse
Affiliation(s)
- Sagar Antala
- Department of Chemistry and Biochemistry, Worcester Polytechnic Institute, Worcester, Massachusetts 01609
| | - Sergey Ovchinnikov
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195; Department of Biochemistry, University of Washington, Seattle, Washington 98195; Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195
| | - Hetunandan Kamisetty
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195; Department of Biochemistry, University of Washington, Seattle, Washington 98195
| | - David Baker
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195; Department of Biochemistry, University of Washington, Seattle, Washington 98195
| | - Robert E Dempski
- Department of Chemistry and Biochemistry, Worcester Polytechnic Institute, Worcester, Massachusetts 01609.
| |
Collapse
|