401
|
Jeong CS, Kim D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinformatics 2016; 17:99. [PMID: 26911566 PMCID: PMC4765150 DOI: 10.1186/s12859-016-0948-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 11/10/2022] Open
Abstract
Background Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. Results In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. Conclusions The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
402
|
Zhang H, Gao Y, Deng M, Wang C, Zhu J, Li SC, Zheng WM, Bu D. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix. Biochem Biophys Res Commun 2016; 472:217-22. [PMID: 26920058 DOI: 10.1016/j.bbrc.2016.01.188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 01/30/2016] [Indexed: 10/22/2022]
Abstract
Strategies for correlation analysis in protein contact prediction often encounter two challenges, namely, the indirect coupling among residues, and the background correlations mainly caused by phylogenetic biases. While various studies have been conducted on how to disentangle indirect coupling, the removal of background correlations still remains unresolved. Here, we present an approach for removing background correlations via low-rank and sparse decomposition (LRS) of a residue correlation matrix. The correlation matrix can be constructed using either local inference strategies (e.g., mutual information, or MI) or global inference strategies (e.g., direct coupling analysis, or DCA). In our approach, a correlation matrix was decomposed into two components, i.e., a low-rank component representing background correlations, and a sparse component representing true correlations. Finally the residue contacts were inferred from the sparse component of correlation matrix. We trained our LRS-based method on the PSICOV dataset, and tested it on both GREMLIN and CASP11 datasets. Our experimental results suggested that LRS significantly improves the contact prediction precision. For example, when equipped with the LRS technique, the prediction precision of MI and mfDCA increased from 0.25 to 0.67 and from 0.58 to 0.70, respectively (Top L/10 predicted contacts, sequence separation: 5 AA, dataset: GREMLIN). In addition, our LRS technique also consistently outperforms the popular denoising technique APC (average product correction), on both local (MI_LRS: 0.67 vs MI_APC: 0.34) and global measures (mfDCA_LRS: 0.70 vs mfDCA_APC: 0.67). Interestingly, we found out that when equipped with our LRS technique, local inference strategies performed in a comparable manner to that of global inference strategies, implying that the application of LRS technique narrowed down the performance gap between local and global inference strategies. Overall, our LRS technique greatly facilitates protein contact prediction by removing background correlations. An implementation of the approach called COLORS (improving COntact prediction using LOw-Rank and Sparse matrix decomposition) is available from http://protein.ict.ac.cn/COLORS/.
Collapse
Affiliation(s)
- Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yujuan Gao
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing, China; School of Mathematical Sciences, Peking University, Beijing, China; Center for Statistical Sciences, Peking University, Beijing, China
| | - Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China.
| |
Collapse
|
403
|
Hönigschmid P, Frishman D. Accurate prediction of helix interactions and residue contacts in membrane proteins. J Struct Biol 2016; 194:112-23. [PMID: 26851352 DOI: 10.1016/j.jsb.2016.02.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 02/01/2016] [Accepted: 02/02/2016] [Indexed: 11/16/2022]
Abstract
Accurate prediction of intra-molecular interactions from amino acid sequence is an important pre-requisite for obtaining high-quality protein models. Over the recent years, remarkable progress in this area has been achieved through the application of novel co-variation algorithms, which eliminate transitive evolutionary connections between residues. In this work we present a new contact prediction method for α-helical transmembrane proteins, MemConP, in which evolutionary couplings are combined with a machine learning approach. MemConP achieves a substantially improved accuracy (precision: 56.0%, recall: 17.5%, MCC: 0.288) compared to the use of either machine learning or co-evolution methods alone. The method also achieves 91.4% precision, 42.1% recall and a MCC of 0.490 in predicting helix-helix interactions based on predicted contacts. The approach was trained and rigorously benchmarked by cross-validation and independent testing on up-to-date non-redundant datasets of 90 and 30 experimental three dimensional structures, respectively. MemConP is a standalone tool that can be downloaded together with the associated training data from http://webclu.bio.wzw.tum.de/MemConP.
Collapse
Affiliation(s)
- Peter Hönigschmid
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Maximus-von-Imhof Forum 3, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Maximus-von-Imhof Forum 3, 85354 Freising, Germany; Helmholtz Zentrum Munich, German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, 85764 Neuherberg, Germany; Laboratory of Bioinformatics, RASA Research Center, St Petersburg State Polytechnical University, St Petersburg 195251, Russia.
| |
Collapse
|
404
|
Noel JK, Morcos F, Onuchic JN. Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics. F1000Res 2016; 5. [PMID: 26918164 PMCID: PMC4755392 DOI: 10.12688/f1000research.7186.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/21/2016] [Indexed: 11/25/2022] Open
Abstract
Experimentally derived structural constraints have been crucial to the implementation of computational models of biomolecular dynamics. For example, not only does crystallography provide essential starting points for molecular simulations but also high-resolution structures permit for parameterization of simplified models. Since the energy landscapes for proteins and other biomolecules have been shown to be minimally frustrated and therefore funneled, these structure-based models have played a major role in understanding the mechanisms governing folding and many functions of these systems. Structural information, however, may be limited in many interesting cases. Recently, the statistical analysis of residue co-evolution in families of protein sequences has provided a complementary method of discovering residue-residue contact interactions involved in functional configurations. These functional configurations are often transient and difficult to capture experimentally. Thus, co-evolutionary information can be merged with that available for experimentally characterized low free-energy structures, in order to more fully capture the true underlying biomolecular energy landscape.
Collapse
Affiliation(s)
- Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA; Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jose N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| |
Collapse
|
405
|
Zhang H, Huang Q, Bei Z, Wei Y, Floudas CA. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming. Proteins 2016; 84:332-48. [PMID: 26756402 DOI: 10.1002/prot.24979] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/19/2015] [Accepted: 12/10/2015] [Indexed: 12/28/2022]
Abstract
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/.
Collapse
Affiliation(s)
- Huiling Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingsheng Huang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Zhendong Bei
- Center for Cloud Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yanjie Wei
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Christodoulos A Floudas
- Department of Chemical Engineering, Texas A&M University, College Station, Texas, 77843.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas, 77843
| |
Collapse
|
406
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 189] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
407
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
408
|
Abstract
In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA.
| |
Collapse
|
409
|
Sahoo A, Khare S, Devanarayanan S, Jain PC, Varadarajan R. Residue proximity information and protein model discrimination using saturation-suppressor mutagenesis. eLife 2015; 4. [PMID: 26716404 PMCID: PMC4758949 DOI: 10.7554/elife.09532] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 12/29/2015] [Indexed: 12/16/2022] Open
Abstract
Identification of residue-residue contacts from primary sequence can be used to guide protein structure prediction. Using Escherichia coli CcdB as the test case, we describe an experimental method termed saturation-suppressor mutagenesis to acquire residue contact information. In this methodology, for each of five inactive CcdB mutants, exhaustive screens for suppressors were performed. Proximal suppressors were accurately discriminated from distal suppressors based on their phenotypes when present as single mutants. Experimentally identified putative proximal pairs formed spatial constraints to recover >98% of native-like models of CcdB from a decoy dataset. Suppressor methodology was also applied to the integral membrane protein, diacylglycerol kinase A where the structures determined by X-ray crystallography and NMR were significantly different. Suppressor as well as sequence co-variation data clearly point to the X-ray structure being the functional one adopted in vivo. The methodology is applicable to any macromolecular system for which a convenient phenotypic assay exists. DOI:http://dx.doi.org/10.7554/eLife.09532.001 Common techniques to determine the three-dimensional structures of proteins can help researchers to understand these molecules’ activities, but are often time-consuming and do not work for all proteins. Proteins are made of chains of amino acids. When a protein chain folds, some of these amino acids interact with other amino acids and these contacts dictate the overall shape of the protein. This means that identifying the pairs of contacting amino acids could make it possible to predict the protein’s structure. Interactions between pairs of contacting amino acids tend to remain conserved throughout evolution, and if a mutation alters one of the amino acids in a pair then a 'compensatory' change often occurs to alter the second amino acid as well. Compensatory mutations can suggest that two amino acids are close to each other in the three-dimensional shape of a protein, but the computational methods used to identify such amino acid pairs can sometimes be inaccurate. In 2012, researchers generated mutants of a bacterial protein called CcdB with changes to single amino acids that caused the protein to fail to fold correctly. Now, Sahoo et al. – who include two of the researchers involved in the 2012 work – have developed an experimental method to identify contacting amino acids and use the CcdB protein as a test case. The approach involved searching for additional mutations that could restore the activity of five of the original mutant proteins when the proteins were produced in yeast cells. The rationale was that any secondary mutations that restored the activity must have corrected the folding defect caused by the original mutation. Sahoo et al. then predicted how close the amino acids affected by the secondary mutations were to the amino acids altered by the original mutations. This information was used to select reliable three-dimensional models of CcdB from a large set of possible structures that had been generated previously using computer models. Next, the technique was applied to a protein called diacylglycerol kinase A. The structure of this protein had previously been inferred using techniques such as X-ray crystallography and nuclear magnetic resonance, but there was a mismatch between the two methods. Sahoo et al. found that the amino acid contacts derived from their experimental method matched those found in the crystal structure, suggesting that the functional protein structure in living cells is similar to the crystal structure. In the future, the experimental approach developed in this work could be combined with existing methods to reliably guide protein structure prediction. DOI:http://dx.doi.org/10.7554/eLife.09532.002
Collapse
Affiliation(s)
- Anusmita Sahoo
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | - Pankaj C Jain
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India.,Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
410
|
Braun T, Koehler Leman J, Lange OF. Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction. PLoS Comput Biol 2015; 11:e1004661. [PMID: 26713437 PMCID: PMC4694711 DOI: 10.1371/journal.pcbi.1004661] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 11/17/2015] [Indexed: 12/18/2022] Open
Abstract
Recent work has shown that the accuracy of ab initio structure prediction can be significantly improved by integrating evolutionary information in form of intra-protein residue-residue contacts. Following this seminal result, much effort is put into the improvement of contact predictions. However, there is also a substantial need to develop structure prediction protocols tailored to the type of restraints gained by contact predictions. Here, we present a structure prediction protocol that combines evolutionary information with the resolution-adapted structural recombination approach of Rosetta, called RASREC. Compared to the classic Rosetta ab initio protocol, RASREC achieves improved sampling, better convergence and higher robustness against incorrect distance restraints, making it the ideal sampling strategy for the stated problem. To demonstrate the accuracy of our protocol, we tested the approach on a diverse set of 28 globular proteins. Our method is able to converge for 26 out of the 28 targets and improves the average TM-score of the entire benchmark set from 0.55 to 0.72 when compared to the top ranked models obtained by the EVFold web server using identical contact predictions. Using a smaller benchmark, we furthermore show that the prediction accuracy of our method is only slightly reduced when the contact prediction accuracy is comparatively low. This observation is of special interest for protein sequences that only have a limited number of homologs. Recently, a breakthrough has been achieved in modeling the atomic 3D structures of proteins from their sequence alone without requiring any experimental work on the protein itself. To achieve this goal, a database of evolutionary related sequences is analyzed to find co-evolving residues, giving insight into which residues are in close proximity to each other. These residue-residue contacts can help to drive a computer simulation with an atomic-scale physical model of the protein structure from a random starting conformation to a native-like 3D conformation. Although much effort is being put into the improvement of residue-residue contact predictions, their accuracy will always be limited. Therefore, structure prediction protocols with a high tolerance against incorrect distance restraints are needed. Here, we present a structure prediction protocol that combines evolutionary information with the iterative sampling approach of the molecular modeling suite Rosetta, called RASREC. RASREC has been shown to converge faster to near-native models and to be more robust against incorrect distance restraints than standard prediction protocols. It is therefore perfectly suited for restraints obtained from predicted residue-residue contacts with limited accuracy. We show that our protocol outperforms other currently published structure prediction methods and is able to achieve accurate structures, even if the accuracy of predicted contacts is low.
Collapse
Affiliation(s)
- Tatjana Braun
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
- * E-mail:
| | - Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Oliver F. Lange
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
| |
Collapse
|
411
|
Perez A, MacCallum JL, Coutsias EA, Dill KA. Constraint methods that accelerate free-energy simulations of biomolecules. J Chem Phys 2015; 143:243143. [PMID: 26723628 PMCID: PMC4684272 DOI: 10.1063/1.4936911] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 11/18/2015] [Indexed: 01/07/2023] Open
Abstract
Atomistic molecular dynamics simulations of biomolecules are critical for generating narratives about biological mechanisms. The power of atomistic simulations is that these are physics-based methods that satisfy Boltzmann's law, so they can be used to compute populations, dynamics, and mechanisms. But physical simulations are computationally intensive and do not scale well to the sizes of many important biomolecules. One way to speed up physical simulations is by coarse-graining the potential function. Another way is to harness structural knowledge, often by imposing spring-like restraints. But harnessing external knowledge in physical simulations is problematic because knowledge, data, or hunches have errors, noise, and combinatoric uncertainties. Here, we review recent principled methods for imposing restraints to speed up physics-based molecular simulations that promise to scale to larger biomolecules and motions.
Collapse
Affiliation(s)
- Alberto Perez
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| | - Justin L MacCallum
- Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Evangelos A Coutsias
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| |
Collapse
|
412
|
Kinjo AR. Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction. Biophys Physicobiol 2015; 12:117-9. [PMID: 27493860 PMCID: PMC4736835 DOI: 10.2142/biophysico.12.0_117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 11/10/2015] [Indexed: 12/01/2022] Open
Abstract
The direct-coupling analysis is a powerful method for protein contact prediction, and enables us to extract “direct” correlations between distant sites that are latent in “indirect” correlations observed in a protein multiple-sequence alignment. I show that the direct correlation can be obtained by using a formulation analogous to the Ornstein-Zernike integral equation in liquid theory. This formulation intuitively illustrates how the indirect or apparent correlation arises from an infinite series of direct correlations, and provides interesting insights into protein structure prediction.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
413
|
Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex. PLoS One 2015; 10:e0143245. [PMID: 26579711 PMCID: PMC4651434 DOI: 10.1371/journal.pone.0143245] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 11/02/2015] [Indexed: 11/19/2022] Open
Abstract
The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env.
Collapse
|
414
|
Fox G, Sievers F, Higgins DG. Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments. ACTA ACUST UNITED AC 2015; 32:814-20. [PMID: 26568625 PMCID: PMC5939968 DOI: 10.1093/bioinformatics/btv592] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 10/10/2015] [Indexed: 01/03/2023]
Abstract
Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact:des.higgins@ucd.ie Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gearóid Fox
- Conway Institute of Biomolecular and Biomedical Research, and UCD School of Medicine and Medical Science, University College Dublin, Dublin 4, Ireland
| | - Fabian Sievers
- Conway Institute of Biomolecular and Biomedical Research, and UCD School of Medicine and Medical Science, University College Dublin, Dublin 4, Ireland
| | - Desmond G Higgins
- Conway Institute of Biomolecular and Biomedical Research, and UCD School of Medicine and Medical Science, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
415
|
Pandini A, Kleinjung J, Rasool S, Khan S. Coevolved Mutations Reveal Distinct Architectures for Two Core Proteins in the Bacterial Flagellar Motor. PLoS One 2015; 10:e0142407. [PMID: 26561852 PMCID: PMC4642947 DOI: 10.1371/journal.pone.0142407] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 10/21/2015] [Indexed: 02/08/2023] Open
Abstract
Switching of bacterial flagellar rotation is caused by large domain movements of the FliG protein triggered by binding of the signal protein CheY to FliM. FliG and FliM form adjacent multi-subunit arrays within the basal body C-ring. The movements alter the interaction of the FliG C-terminal (FliGC) “torque” helix with the stator complexes. Atomic models based on the Salmonella entrovar C-ring electron microscopy reconstruction have implications for switching, but lack consensus on the relative locations of the FliG armadillo (ARM) domains (amino-terminal (FliGN), middle (FliGM) and FliGC) as well as changes during chemotaxis. The generality of the Salmonella model is challenged by the variation in motor morphology and response between species. We studied coevolved residue mutations to determine the unifying elements of switch architecture. Residue interactions, measured by their coevolution, were formalized as a network, guided by structural data. Our measurements reveal a common design with dedicated switch and motor modules. The FliM middle domain (FliMM) has extensive connectivity most simply explained by conserved intra and inter-subunit contacts. In contrast, FliG has patchy, complex architecture. Conserved structural motifs form interacting nodes in the coevolution network that wire FliMM to the FliGC C-terminal, four-helix motor module (C3-6). FliG C3-6 coevolution is organized around the torque helix, differently from other ARM domains. The nodes form separated, surface-proximal patches that are targeted by deleterious mutations as in other allosteric systems. The dominant node is formed by the EHPQ motif at the FliMMFliGM contact interface and adjacent helix residues at a central location within FliGM. The node interacts with nodes in the N-terminal FliGc α-helix triad (ARM-C) and FliGN. ARM-C, separated from C3-6 by the MFVF motif, has poor intra-network connectivity consistent with its variable orientation revealed by structural data. ARM-C could be the convertor element that provides mechanistic and species diversity.
Collapse
Affiliation(s)
- Alessandro Pandini
- Department of Computer Science and Synthetic Biology Theme, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Jens Kleinjung
- Mathematical Biology, Francis Crick Institute, Ridgeway, Mill Hill, London NW7 1AA, United Kingdom
| | - Shafqat Rasool
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Shahid Khan
- Molecular Biology Consortium, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States of America
- * E-mail:
| |
Collapse
|
416
|
Terashi G, Takeda-Shitaka M. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area. PLoS One 2015; 10:e0141440. [PMID: 26502070 PMCID: PMC4621035 DOI: 10.1371/journal.pone.0141440] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/08/2015] [Indexed: 12/26/2022] Open
Abstract
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | |
Collapse
|
417
|
From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci U S A 2015; 112:13567-72. [PMID: 26487681 DOI: 10.1073/pnas.1508584112] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The analysis of evolutionary amino acid correlations has recently attracted a surge of renewed interest, also due to their successful use in de novo protein native structure prediction. However, many aspects of protein function, such as substrate binding and product release in enzymatic activity, can be fully understood only in terms of an equilibrium ensemble of alternative structures, rather than a single static structure. In this paper we combine coevolutionary data and molecular dynamics simulations to study protein conformational heterogeneity. To that end, we adapt the Boltzmann-learning algorithm to the analysis of homologous protein sequences and develop a coarse-grained protein model specifically tailored to convert the resulting contact predictions to a protein structural ensemble. By means of exhaustive sampling simulations, we analyze the set of conformations that are consistent with the observed residue correlations for a set of representative protein domains, showing that (i) the most representative structure is consistent with the experimental fold and (ii) the various regions of the sequence display different stability, related to multiple biologically relevant conformations and to the cooperativity of the coevolving pairs. Moreover, we show that the proposed protocol is able to reproduce the essential features of a protein folding mechanism as well as to account for regions involved in conformational transitions through the correct sampling of the involved conformers.
Collapse
|
418
|
Hou Q, Dutilh BE, Huynen MA, Heringa J, Feenstra KA. Sequence specificity between interacting and non-interacting homologs identifies interface residues--a homodimer and monomer use case. BMC Bioinformatics 2015; 16:325. [PMID: 26449222 PMCID: PMC4599308 DOI: 10.1186/s12859-015-0758-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 09/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. Results We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision. Conclusions To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0758-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands. .,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands. .,Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
419
|
Márquez-Chamorro AE, Asencio-Cortés G, Santiesteban-Toca CE, Aguilar-Ruiz JS. Soft computing methods for the prediction of protein tertiary structures: A survey. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
420
|
De Leonardis E, Lutz B, Ratz S, Cocco S, Monasson R, Schug A, Weigt M. Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucleic Acids Res 2015; 43:10444-55. [PMID: 26420827 PMCID: PMC4666395 DOI: 10.1093/nar/gkv932] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/07/2015] [Indexed: 12/16/2022] Open
Abstract
Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.
Collapse
Affiliation(s)
- Eleonora De Leonardis
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Benjamin Lutz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Sebastian Ratz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Rémi Monasson
- Laboratoire de Physique Théorique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Martin Weigt
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France
| |
Collapse
|
421
|
Zhang W, Yang J, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11. Proteins 2015; 84 Suppl 1:76-86. [PMID: 26370505 DOI: 10.1002/prot.24930] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/26/2015] [Accepted: 09/10/2015] [Indexed: 11/12/2022]
Abstract
We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109. .,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|
422
|
Jacob E, Unger R, Horovitz A. Codon-level information improves predictions of inter-residue contacts in proteins by correlated mutation analysis. eLife 2015; 4:e08932. [PMID: 26371555 PMCID: PMC4602084 DOI: 10.7554/elife.08932] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 09/13/2015] [Indexed: 12/11/2022] Open
Abstract
Methods for analysing correlated mutations in proteins are becoming an increasingly powerful tool for predicting contacts within and between proteins. Nevertheless, limitations remain due to the requirement for large multiple sequence alignments (MSA) and the fact that, in general, only the relatively small number of top-ranking predictions are reliable. To date, methods for analysing correlated mutations have relied exclusively on amino acid MSAs as inputs. Here, we describe a new approach for analysing correlated mutations that is based on combined analysis of amino acid and codon MSAs. We show that a direct contact is more likely to be present when the correlation between the positions is strong at the amino acid level but weak at the codon level. The performance of different methods for analysing correlated mutations in predicting contacts is shown to be enhanced significantly when amino acid and codon data are combined.
Collapse
Affiliation(s)
- Etai Jacob
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
423
|
Kell DB, Pretorius E. The simultaneous occurrence of both hypercoagulability and hypofibrinolysis in blood and serum during systemic inflammation, and the roles of iron and fibrin(ogen). Integr Biol (Camb) 2015; 7:24-52. [PMID: 25335120 DOI: 10.1039/c4ib00173g] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Although the two phenomena are usually studied separately, we summarise a considerable body of literature to the effect that a great many diseases involve (or are accompanied by) both an increased tendency for blood to clot (hypercoagulability) and the resistance of the clots so formed (hypofibrinolysis) to the typical, 'healthy' or physiological lysis. We concentrate here on the terminal stages of fibrin formation from fibrinogen, as catalysed by thrombin. Hypercoagulability goes hand in hand with inflammation, and is strongly influenced by the fibrinogen concentration (and vice versa); this can be mediated via interleukin-6. Poorly liganded iron is a significant feature of inflammatory diseases, and hypofibrinolysis may change as a result of changes in the structure and morphology of the clot, which may be mimicked in vitro, and may be caused in vivo, by the presence of unliganded iron interacting with fibrin(ogen) during clot formation. Many of these phenomena are probably caused by electrostatic changes in the iron-fibrinogen system, though hydroxyl radical (OH˙) formation can also contribute under both acute and (more especially) chronic conditions. Many substances are known to affect the nature of fibrin polymerised from fibrinogen, such that this might be seen as a kind of bellwether for human or plasma health. Overall, our analysis demonstrates the commonalities underpinning a variety of pathologies as seen in both hypercoagulability and hypofibrinolysis, and offers opportunities for both diagnostics and therapies.
Collapse
Affiliation(s)
- Douglas B Kell
- School of Chemistry and The Manchester Institute of Biotechnology, The University of Manchester, 131, Princess St, Manchester M1 7DN, Lancs, UK.
| | | |
Collapse
|
424
|
Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV, Baker D. Large-scale determination of previously unsolved protein structures using evolutionary information. eLife 2015; 4:e09248. [PMID: 26335199 PMCID: PMC4602095 DOI: 10.7554/elife.09248] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Accepted: 08/30/2015] [Indexed: 12/18/2022] Open
Abstract
The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, United States
| | - Lisa Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, United States
| | - Yuxing Liao
- Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
| | - David E Kim
- Department of Biochemistry, University of Washington, Seattle, United States
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
- Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, United States
- Howard Hughes Medical Institute, University of Washington, Seattle, United States
| |
Collapse
|
425
|
Alford RF, Koehler Leman J, Weitzner BD, Duran AM, Tilley DC, Elazar A, Gray JJ. An Integrated Framework Advancing Membrane Protein Modeling and Design. PLoS Comput Biol 2015; 11:e1004398. [PMID: 26325167 PMCID: PMC4556676 DOI: 10.1371/journal.pcbi.1004398] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 06/09/2015] [Indexed: 11/19/2022] Open
Abstract
Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design.
Collapse
Affiliation(s)
- Rebecca F. Alford
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Brian D. Weitzner
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Amanda M. Duran
- Center for Structural Biology, Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Drew C. Tilley
- Department of Physiology and Membrane Biology, University of California, Davis, Davis, California, United States of America
| | - Assaf Elazar
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
426
|
Determination of specificity influencing residues for key transcription factor families. QUANTITATIVE BIOLOGY 2015; 3:115-123. [PMID: 26753103 DOI: 10.1007/s40484-015-0045-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (Direct Information, PSICOV and adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs forhomeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at stormo.wustl.edu/SpecPred.
Collapse
|
427
|
Raval A, Piana S, Eastwood MP, Shaw DE. Assessment of the utility of contact-based restraints in accelerating the prediction of protein structure using molecular dynamics simulations. Protein Sci 2015; 25:19-29. [PMID: 26266489 PMCID: PMC4815320 DOI: 10.1002/pro.2770] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Revised: 08/07/2015] [Accepted: 08/11/2015] [Indexed: 12/15/2022]
Abstract
Molecular dynamics (MD) simulation is a well-established tool for the computational study of protein structure and dynamics, but its application to the important problem of protein structure prediction remains challenging, in part because extremely long timescales can be required to reach the native structure. Here, we examine the extent to which the use of low-resolution information in the form of residue-residue contacts, which can often be inferred from bioinformatics or experimental studies, can accelerate the determination of protein structure in simulation. We incorporated sets of 62, 31, or 15 contact-based restraints in MD simulations of ubiquitin, a benchmark system known to fold to the native state on the millisecond timescale in unrestrained simulations. One-third of the restrained simulations folded to the native state within a few tens of microseconds-a speedup of over an order of magnitude compared with unrestrained simulations and a demonstration of the potential for limited amounts of structural information to accelerate structure determination. Almost all of the remaining ubiquitin simulations reached near-native conformations within a few tens of microseconds, but remained trapped there, apparently due to the restraints. We discuss potential methodological improvements that would facilitate escape from these near-native traps and allow more simulations to quickly reach the native state. Finally, using a target from the Critical Assessment of protein Structure Prediction (CASP) experiment, we show that distance restraints can improve simulation accuracy: In our simulations, restraints stabilized the native state of the protein, enabling a reasonable structural model to be inferred.
Collapse
Affiliation(s)
- Alpan Raval
- D. E. Shaw Research, New York, New York, 10036
| | | | | | - David E Shaw
- D. E. Shaw Research, New York, New York, 10036.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, 10032
| |
Collapse
|
428
|
Identification of Protein–Protein Interactions by Detecting Correlated Mutation at the Interface. J Chem Inf Model 2015; 55:2042-9. [DOI: 10.1021/acs.jcim.5b00320] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
429
|
Avila-Herrera A, Pollard KS. Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species. BMC Bioinformatics 2015; 16:268. [PMID: 26303588 PMCID: PMC4549020 DOI: 10.1186/s12859-015-0677-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/17/2015] [Indexed: 01/09/2023] Open
Abstract
Background When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure. Results Our quantitative benchmarking showed that all coevolutionary methods clearly benefit from alignments with many sequences. Methods that aim to detect direct correlations generally outperform other approaches. However, faster mutual information based methods are occasionally competitive in small alignments and with relaxed false positive rates. Two commonly used null distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments. Conclusions We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0677-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aram Avila-Herrera
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | - Katherine S Pollard
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA. .,Department of Epidemiology and Biostatistics, University of California, San Francisco, USA. .,Institute for Human Genetics, University of California, San Francisco, 94158, CA, USA.
| |
Collapse
|
430
|
Dietzen M, Kalinina OV, Taškova K, Kneissl B, Hildebrandt AK, Jaenicke E, Decker H, Lengauer T, Hildebrandt A. Large oligomeric complex structures can be computationally assembled by efficiently combining docked interfaces. Proteins 2015; 83:1887-99. [PMID: 26248608 PMCID: PMC5049452 DOI: 10.1002/prot.24873] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 07/20/2015] [Accepted: 07/29/2015] [Indexed: 11/06/2022]
Abstract
Macromolecular oligomeric assemblies are involved in many biochemical processes of living organisms. The benefits of such assemblies in crowded cellular environments include increased reaction rates, efficient feedback regulation, cooperativity and protective functions. However, an atom-level structural determination of large assemblies is challenging due to the size of the complex and the difference in binding affinities of the involved proteins. In this study, we propose a novel combinatorial greedy algorithm for assembling large oligomeric complexes from information on the approximate position of interaction interfaces of pairs of monomers in the complex. Prior information on complex symmetry is not required but rather the symmetry is inferred during assembly. We implement an efficient geometric score, the transformation match score, that bypasses the model ranking problems of state-of-the-art scoring functions by scoring the similarity between the inferred dimers of the same monomer simultaneously with different binding partners in a (sub)complex with a set of pregenerated docking poses. We compiled a diverse benchmark set of 308 homo and heteromeric complexes containing 6 to 60 monomers. To explore the applicability of the method, we considered 48 sets of parameters and selected those three sets of parameters, for which the algorithm can correctly reconstruct the maximum number, namely 252 complexes (81.8%) in, at least one of the respective three runs. The crossvalidation coverage, that is, the mean fraction of correctly reconstructed benchmark complexes during crossvalidation, was 78.1%, which demonstrates the ability of the presented method to correctly reconstruct topology of a large variety of biological complexes.
Collapse
Affiliation(s)
- Matthias Dietzen
- Max Planck Institute for Informatics, Campus E1 4, Saarbrücken, 66123, Germany
| | - Olga V Kalinina
- Max Planck Institute for Informatics, Campus E1 4, Saarbrücken, 66123, Germany
| | - Katerina Taškova
- Institute of Computer Science, Johannes Gutenberg University, Staudingerweg 9, Mainz, 55128, Germany.,Institute for Molecular Biology, Johannes Gutenberg University, Ackermannweg 4, Mainz, 55128, Germany
| | - Benny Kneissl
- Institute of Computer Science, Johannes Gutenberg University, Staudingerweg 9, Mainz, 55128, Germany.,Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Nonnenwald 2, Penzberg, 82377, Germany
| | | | - Elmar Jaenicke
- Institute of Molecular Biophysics, Johannes Gutenberg University, Jakob-Welder-Weg 26, Mainz, 55128, Germany
| | - Heinz Decker
- Institute of Molecular Biophysics, Johannes Gutenberg University, Jakob-Welder-Weg 26, Mainz, 55128, Germany
| | - Thomas Lengauer
- Max Planck Institute for Informatics, Campus E1 4, Saarbrücken, 66123, Germany
| | - Andreas Hildebrandt
- Institute of Computer Science, Johannes Gutenberg University, Staudingerweg 9, Mainz, 55128, Germany
| |
Collapse
|
431
|
Abstract
Here we present the results of residue-residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top-L/5 long-range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two-stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple-sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2016; 84(Suppl 1):145-151. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Tomasz Kosciolek
- Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
432
|
Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015; 31:3506-13. [PMID: 26275894 DOI: 10.1093/bioinformatics/btv472] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Protein contact prediction is important for protein structure and functional study. Both evolutionary coupling (EC) analysis and supervised machine learning methods have been developed, making use of different information sources. However, contact prediction is still challenging especially for proteins without a large number of sequence homologs. RESULTS This article presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from existing single-family EC analysis that uses residue coevolution information in only the target protein family, our joint EC analysis uses residue coevolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement this, we model a set of related protein families using Gaussian graphical models and then coestimate their parameters by maximum-likelihood, subject to the constraint that these parameters shall be similar to some degree. Our GGL method can also integrate supervised learning methods to further improve accuracy. Experiments show that our method outperforms existing methods on proteins without thousands of sequence homologs, and that our method performs better on both conserved and family-specific contacts. AVAILABILITY AND IMPLEMENTATION See http://raptorx.uchicago.edu/ContactMap/ for a web server implementing the method. CONTACT j3xu@ttic.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhu Ma
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| | - Sheng Wang
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| | - Zhiyong Wang
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA
| |
Collapse
|
433
|
Yang J, He BJ, Jang R, Zhang Y, Shen HB. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins. Bioinformatics 2015; 31:3773-81. [PMID: 26254435 DOI: 10.1093/bioinformatics/btv459] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 08/02/2015] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g., >3 bonds, is too low to effectively assist structure assembly simulations. RESULTS We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ CONTACT zhng@umich.edu or hbshen@sjtu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Bao-Ji He
- State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China, Department of Computational Medicine and Bioinformatics and
| | - Richard Jang
- Department of Computational Medicine and Bioinformatics and
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and
| |
Collapse
|
434
|
Haliloglu T, Bahar I. Adaptability of protein structures to enable functional interactions and evolutionary implications. Curr Opin Struct Biol 2015; 35:17-23. [PMID: 26254902 DOI: 10.1016/j.sbi.2015.07.007] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 07/15/2015] [Accepted: 07/20/2015] [Indexed: 12/21/2022]
Abstract
Several studies in recent years have drawn attention to the ability of proteins to adapt to intermolecular interactions by conformational changes along structure-encoded collective modes of motions. These so-called soft modes, primarily driven by entropic effects, facilitate, if not enable, functional interactions. They represent excursions on the conformational space along principal low-ascent directions/paths away from the original free energy minimum, and they are accessible to the protein even before protein-protein/ligand interactions. An emerging concept from these studies is the evolution of structures or modular domains to favor such modes of motion that will be recruited or integrated for enabling functional interactions. Structural dynamics, including the allosteric switches in conformation that are often stabilized upon formation of complexes and multimeric assemblies, emerge as key properties that are evolutionarily maintained to accomplish biological activities, consistent with the paradigm sequence→structure→dynamics→function where 'dynamics' bridges structure and function.
Collapse
Affiliation(s)
- Turkan Haliloglu
- Department of Chemical Engineering and Polymer Research Center, and Center for Life Sciences and Technologies, Bogazici University, 34342 Istanbul, Turkey; Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
435
|
AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model. BIOMED RESEARCH INTERNATIONAL 2015; 2015:678764. [PMID: 26339631 PMCID: PMC4538422 DOI: 10.1155/2015/678764] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2014] [Accepted: 03/11/2015] [Indexed: 12/14/2022]
Abstract
Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.
Collapse
|
436
|
Adhikari B, Bhattacharya D, Cao R, Cheng J. CONFOLD: Residue-residue contact-guided ab initio protein folding. Proteins 2015; 83:1436-49. [PMID: 25974172 PMCID: PMC4509844 DOI: 10.1002/prot.24829] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 04/11/2015] [Accepted: 05/02/2015] [Indexed: 12/20/2022]
Abstract
Predicted protein residue-residue contacts can be used to build three-dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three-dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two-stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β-sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM-score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM-score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | | | - Renzhi Cao
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
437
|
Stein RR, Marks DS, Sander C. Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models. PLoS Comput Biol 2015. [PMID: 26225866 PMCID: PMC4520494 DOI: 10.1371/journal.pcbi.1004182] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene–gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.
Collapse
Affiliation(s)
- Richard R. Stein
- Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail: (RRS); (CS)
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Chris Sander
- Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail: (RRS); (CS)
| |
Collapse
|
438
|
Ahmad A, Cai Y, Chen X, Shuai J, Han A. Conformational Dynamics of Response Regulator RegX3 from Mycobacterium tuberculosis. PLoS One 2015. [PMID: 26201027 PMCID: PMC4511772 DOI: 10.1371/journal.pone.0133389] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Two-component signal transduction systems (TCS) are vital for adaptive responses to various environmental stresses in bacteria, fungi and even plants. A TCS typically comprises of a sensor histidine kinase (SK) with its cognate response regulator (RR), which often has two domains—N terminal receiver domain (RD) and C terminal effector domain (ED). The histidine kinase phosphorylates the RD to activate the ED by promoting dimerization. However, despite significant progress on structural studies, how RR transmits activation signal from RD to ED remains elusive. Here we analyzed active to inactive transition process of OmpR/PhoB family using an active conformation of RegX3 from Mycobacterium tuberculosis as a model system by computational approaches. An inactive state of RegX3 generated from 150 ns molecular dynamic simulation has rotameric conformations of Thr79 and Tyr98 that are generally conserved in inactive RRs. Arg81 in loop β4α4 acts synergistically with loop β1α1 to change its interaction partners during active to inactive transition, potentially leading to the N-terminal movement of RegX3 helix α1. Global conformational dynamics of RegX3 is mainly dependent on α4β5 region, in particular seven ‘hot-spot’ residues (Tyr98 to Ser104), adjacent to which several coevolved residues at dimeric interface, including Ile76-Asp96, Asp97-Arg111 and Glu24-Arg113 pairs, are critical for signal transduction. Taken together, our computational analyses suggest a molecular linkage between Asp phosphorylation, proximal loops and α4β5α5 dimeric interface during RR active to inactive state transition, which is not often evidently defined from static crystal structures.
Collapse
Affiliation(s)
- Ashfaq Ahmad
- State Key Laboratory for Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan, Xiamen, China
| | - Yongfei Cai
- State Key Laboratory for Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan, Xiamen, China
| | - Xingqiang Chen
- Department of Physics, Xiamen University, Siming, Xiamen, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Siming, Xiamen, China
| | - Aidong Han
- State Key Laboratory for Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan, Xiamen, China
| |
Collapse
|
439
|
Raimondi D, Orlando G, Vranken WF. An Evolutionary View on Disulfide Bond Connectivities Prediction Using Phylogenetic Trees and a Simple Cysteine Mutation Model. PLoS One 2015; 10:e0131792. [PMID: 26161671 PMCID: PMC4498770 DOI: 10.1371/journal.pone.0131792] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 06/07/2015] [Indexed: 01/09/2023] Open
Abstract
Disulfide bonds are crucial for many structural and functional aspects of proteins. They have a stabilizing role during folding, can regulate enzymatic activity and can trigger allosteric changes in the protein structure. Moreover, knowledge of the topology of the disulfide connectivity can be relevant in genomic annotation tasks and can provide long range constraints for ab-initio protein structure predictors. In this paper we describe PhyloCys, a novel unsupervised predictor of disulfide bond connectivity from known cysteine oxidation states. For each query protein, PhyloCys retrieves and aligns homologs with HHblits and builds a phylogenetic tree using ClustalW. A simplified model of cysteine co-evolution is then applied to the tree in order to hypothesize the presence of oxidized cysteines in the inner nodes of the tree, which represent ancestral protein sequences. The tree is then traversed from the leaves to the root and the putative disulfide connectivity is inferred by observing repeated patterns of tandem mutations between a sequence and its ancestors. A final correction is applied using the Edmonds-Gabow maximum weight perfect matching algorithm. The evolutionary approach applied in PhyloCys results in disulfide bond predictions equivalent to Sephiroth, another approach that takes whole sequence information into account, and is 26-29% better than state of the art methods based on cysteine covariance patterns in multiple sequence alignments, while requiring one order of magnitude fewer homologous sequences (10(3) instead of 10(4)), thus extending its range of applicability. The software described in this article and the datasets used are available at http://ibsquare.be/phylocys.
Collapse
Affiliation(s)
- Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Department of Structural Biology, VIB, Brussels, Belgium
- Machine Learning Group, ULB, Brussels, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Department of Structural Biology, VIB, Brussels, Belgium
- Machine Learning Group, ULB, Brussels, Belgium
| | - Wim F. Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Department of Structural Biology, VIB, Brussels, Belgium
| |
Collapse
|
440
|
Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU. Capturing coevolutionary signals inrepeat proteins. BMC Bioinformatics 2015; 16:207. [PMID: 26134293 PMCID: PMC4489039 DOI: 10.1186/s12859-015-0648-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 06/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging. RESULTS We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. CONCLUSIONS The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.
Collapse
Affiliation(s)
- Rocío Espada
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina.,Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - R Gonzalo Parra
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Thierry Mora
- Laboratoire de physique statistique, CNRS, UPMC and École normale supérieure, 24 rue Lhomond, Paris, 75005, France
| | | | - Diego U Ferreiro
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| |
Collapse
|
441
|
Bahar I, Cheng MH, Lee JY, Kaya C, Zhang S. Structure-Encoded Global Motions and Their Role in Mediating Protein-Substrate Interactions. Biophys J 2015; 109:1101-9. [PMID: 26143655 DOI: 10.1016/j.bpj.2015.06.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 06/02/2015] [Accepted: 06/03/2015] [Indexed: 12/22/2022] Open
Abstract
Recent structure-based computational studies suggest that, in contrast to the classical description of equilibrium fluctuations as wigglings and jigglings, proteins have access to well-defined spectra of collective motions, called intrinsic dynamics, encoded by their structure under native state conditions. In particular, the global modes of motions (at the low frequency end of the spectrum) are shown by multiple studies to be highly robust to minor differences in the structure or to detailed interactions at the atomic level. These modes, encoded by the overall fold, usually define the mechanisms of interactions with substrates. They can be estimated by low-resolution models such as the elastic network models (ENMs) exclusively based on interresidue contact topology. The ability of ENMs to efficiently assess the global motions intrinsically favored by the overall fold as well as the relevance of these predictions to the dominant changes in structure experimentally observed for a given protein in the presence of different substrates suggest that the intrinsic dynamics plays a role in mediating protein-substrate interactions. These observations underscore the functional significance of structure-encoded dynamics, or the importance of the predisposition to favor functional global modes in the evolutionary selection of structures.
Collapse
Affiliation(s)
- Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.
| | - Mary Hongying Cheng
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ji Young Lee
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Cihan Kaya
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - She Zhang
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
442
|
Pietal MJ, Bujnicki JM, Kozlowski LP. GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function. Bioinformatics 2015; 31:3499-505. [PMID: 26130575 DOI: 10.1093/bioinformatics/btv390] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 06/23/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION To date, only a few distinct successful approaches have been introduced to reconstruct a protein 3D structure from a map of contacts between its amino acid residues (a 2D contact map). Current algorithms can infer structures from information-rich contact maps that contain a limited fraction of erroneous predictions. However, it is difficult to reconstruct 3D structures from predicted contact maps that usually contain a high fraction of false contacts. RESULTS We describe a new, multi-step protocol that predicts protein 3D structures from the predicted contact maps. The method is based on a novel distance function acting on a fuzzy residue proximity graph, which predicts a 2D distance map from a 2D predicted contact map. The application of a Multi-Dimensional Scaling algorithm transforms that predicted 2D distance map into a coarse 3D model, which is further refined by typical modeling programs into an all-atom representation. We tested our approach on contact maps predicted de novo by MULTICOM, the top contact map predictor according to CASP10. We show that our method outperforms FT-COMAR, the state-of-the-art method for 3D structure reconstruction from 2D maps. For all predicted 2D contact maps of relatively low sensitivity (60-84%), GDFuzz3D generates more accurate 3D models, with the average improvement of 4.87 Å in terms of RMSD. AVAILABILITY AND IMPLEMENTATION GDFuzz3D server and standalone version are freely available at http://iimcb.genesilico.pl/gdserver/GDFuzz3D/. CONTACT iamb@genesilico.pl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michal J Pietal
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland, Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland and
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland, Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Poznan, Poland
| | - Lukasz P Kozlowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| |
Collapse
|
443
|
Tang Y, Huang YJ, Hopf TA, Sander C, Marks DS, Montelione GT. Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat Methods 2015; 12:751-4. [PMID: 26121406 PMCID: PMC4521990 DOI: 10.1038/nmeth.3455] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 05/26/2015] [Indexed: 11/13/2022]
Abstract
Accurate protein structure determination by NMR is challenging for larger proteins, for which experimental data is often incomplete and ambiguous. Fortunately, the upsurge in evolutionary sequence information and advances in maximum entropy statistical methods now provide a rich complementary source of structural constraints. We have developed a hybrid approach (EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings, and demonstrate accurate structure determination for several 6 to 41 kDa proteins.
Collapse
Affiliation(s)
- Yuefeng Tang
- 1] Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [2] Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Yuanpeng Janet Huang
- 1] Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [2] Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Thomas A Hopf
- 1] Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA. [2] Department of Informatics, Technische Universität München, Garching, Germany
| | - Chris Sander
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Gaetano T Montelione
- 1] Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [2] Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [3] Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| |
Collapse
|
444
|
Soltan Ghoraie L, Burkowski F, Zhu M. Using kernelized partial canonical correlation analysis to study directly coupled side chains and allostery in small G proteins. Bioinformatics 2015; 31:i124-32. [PMID: 26072474 PMCID: PMC4765857 DOI: 10.1093/bioinformatics/btv241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Motivation: Inferring structural dependencies among a protein’s side chains helps us understand their coupled motions. It is known that coupled fluctuations can reveal pathways of communication used for information propagation in a molecule. Side-chain conformations are commonly represented by multivariate angular variables, but existing partial correlation methods that can be applied to this inference task are not capable of handling multivariate angular data. We propose a novel method to infer direct couplings from this type of data, and show that this method is useful for identifying functional regions and their interactions in allosteric proteins. Results: We developed a novel extension of canonical correlation analysis (CCA), which we call ‘kernelized partial CCA’ (or simply KPCCA), and used it to infer direct couplings between side chains, while disentangling these couplings from indirect ones. Using the conformational information and fluctuations of the inactive structure alone for allosteric proteins in the Ras and other Ras-like families, our method identified allosterically important residues not only as strongly coupled ones but also in densely connected regions of the interaction graph formed by the inferred couplings. Our results were in good agreement with other empirical findings. By studying distinct members of the Ras, Rho and Rab sub-families, we show further that KPCCA was capable of inferring common allosteric characteristics in the small G protein super-family. Availability and implementation:https://github.com/lsgh/ismb15 Contact:lsoltang@uwaterloo.ca
Collapse
Affiliation(s)
- Laleh Soltan Ghoraie
- Department of Computer Science and Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - Forbes Burkowski
- Department of Computer Science and Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - Mu Zhu
- Department of Computer Science and Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
445
|
MacCallum JL, Perez A, Dill KA. Determining protein structures by combining semireliable data with atomistic physical models by Bayesian inference. Proc Natl Acad Sci U S A 2015; 112:6985-90. [PMID: 26038552 PMCID: PMC4460504 DOI: 10.1073/pnas.1506788112] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
More than 100,000 protein structures are now known at atomic detail. However, far more are not yet known, particularly among large or complex proteins. Often, experimental information is only semireliable because it is uncertain, limited, or confusing in important ways. Some experiments give sparse information, some give ambiguous or nonspecific information, and others give uncertain information-where some is right, some is wrong, but we don't know which. We describe a method called Modeling Employing Limited Data (MELD) that can harness such problematic information in a physics-based, Bayesian framework for improved structure determination. We apply MELD to eight proteins of known structure for which such problematic structural data are available, including a sparse NMR dataset, two ambiguous EPR datasets, and four uncertain datasets taken from sequence evolution data. MELD gives excellent structures, indicating its promise for experimental biomolecule structure determination where only semireliable data are available.
Collapse
Affiliation(s)
- Justin L MacCallum
- Department of Chemistry, University of Calgary, Calgary, AB, Canada T2N 1N4;
| | - Alberto Perez
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794; Departments of Chemistry and Physics, Stony Brook University, Stony Brook, NY 11794
| |
Collapse
|
446
|
Iserte J, Simonetti FL, Zea DJ, Teppa E, Marino-Buslje C. I-COMS: Interprotein-COrrelated Mutations Server. Nucleic Acids Res 2015; 43:W320-5. [PMID: 26032772 PMCID: PMC4489276 DOI: 10.1093/nar/gkv572] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 05/21/2015] [Indexed: 12/31/2022] Open
Abstract
Interprotein contact prediction using multiple sequence alignments (MSAs) is a useful approach to help detect protein–protein interfaces. Different computational methods have been developed in recent years as an approximation to solve this problem. However, as there are discrepancies in the results provided by them, there is still no consensus on which is the best performing methodology. To address this problem, I-COMS (interprotein COrrelated Mutations Server) is presented. I-COMS allows to estimate covariation between residues of different proteins by four different covariation methods. It provides a graphical and interactive output that helps compare results obtained using different methods. I-COMS automatically builds the required MSA for the calculation and produces a rich visualization of either intraprotein and/or interprotein covariating positions in a circos representation. Furthermore, comparison between any two methods is available as well as the overlap between any or all four methodologies. In addition, as a complementary source of information, a matrix visualization of the corresponding scores is made available and the density plot distribution of the inter, intra and inter+intra scores are calculated. Finally, all the results can be downloaded (including MSAs, scores and graphics) for comparison and visualization and/or for further analysis.
Collapse
Affiliation(s)
- Javier Iserte
- Fundación Instituto Leloir. Av. Patricias Argentinas 435, C1405BWE, Buenos Aires, Argentina
| | - Franco L Simonetti
- Fundación Instituto Leloir. Av. Patricias Argentinas 435, C1405BWE, Buenos Aires, Argentina
| | - Diego J Zea
- Fundación Instituto Leloir. Av. Patricias Argentinas 435, C1405BWE, Buenos Aires, Argentina
| | - Elin Teppa
- Fundación Instituto Leloir. Av. Patricias Argentinas 435, C1405BWE, Buenos Aires, Argentina
| | - Cristina Marino-Buslje
- Fundación Instituto Leloir. Av. Patricias Argentinas 435, C1405BWE, Buenos Aires, Argentina
| |
Collapse
|
447
|
Parisi G, Zea DJ, Monzon AM, Marino-Buslje C. Conformational diversity and the emergence of sequence signatures during evolution. Curr Opin Struct Biol 2015; 32:58-65. [DOI: 10.1016/j.sbi.2015.02.005] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 02/02/2015] [Accepted: 02/09/2015] [Indexed: 02/03/2023]
|
448
|
Wang Y, Barth P. Evolutionary-guided de novo structure prediction of self-associated transmembrane helical proteins with near-atomic accuracy. Nat Commun 2015; 6:7196. [PMID: 25995083 DOI: 10.1038/ncomms8196] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 04/15/2015] [Indexed: 11/09/2022] Open
Abstract
How specific protein associations regulate the function of membrane receptors remains poorly understood. Conformational flexibility currently hinders the structure determination of several classes of membrane receptors and associated oligomers. Here we develop EFDOCK-TM, a general method to predict self-associated transmembrane protein helical (TMH) structures from sequence guided by co-evolutionary information. We show that accurate intermolecular contacts can be identified using a combination of protein sequence covariation and TMH binding surfaces predicted from sequence. When applied to diverse TMH oligomers, including receptors characterized in multiple conformational and functional states, the method reaches unprecedented near-atomic accuracy for most targets. Blind predictions of structurally uncharacterized receptor tyrosine kinase TMH oligomers provide a plausible hypothesis on the molecular mechanisms of disease-associated point mutations and binding surfaces for the rational design of selective inhibitors. The method sets the stage for uncovering novel determinants of molecular recognition and signalling in single-spanning eukaryotic membrane receptors.
Collapse
Affiliation(s)
- Y Wang
- Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - P Barth
- 1] Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA [2] Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA [3] Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| |
Collapse
|
449
|
Kamisetty H, Ghosh B, Langmead CJ, Bailey-Kellogg C. Learning sequence determinants of protein:protein interaction specificity with sparse graphical models. J Comput Biol 2015; 22:474-86. [PMID: 25973864 DOI: 10.1089/cmb.2014.0289] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (data-driven graphical models of specificity in protein:protein interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the ΔG of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted ΔG values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level ΔG predictions to be naturally understood in terms of residue-level constraints. Finally, the model DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.
Collapse
Affiliation(s)
| | - Bornika Ghosh
- 3Department of Computer Science, Dartmouth, Hanover, New Hampshire
| | | | | |
Collapse
|
450
|
Abstract
Recent developments in the analysis of amino acid covariation are leading to breakthroughs in protein structure prediction, protein design, and prediction of the interactome. It is assumed that observed patterns of covariation are caused by molecular coevolution, where substitutions at one site affect the evolutionary forces acting at neighboring sites. Our theoretical and empirical results cast doubt on this assumption. We demonstrate that the strongest coevolutionary signal is a decrease in evolutionary rate and that unfeasibly long times are required to produce coordinated substitutions. We find that covarying substitutions are mostly found on different branches of the phylogenetic tree, indicating that they are independent events that may or may not be attributable to coevolution. These observations undermine the hypothesis that molecular coevolution is the primary cause of the covariation signal. In contrast, we find that the pairs of residues with the strongest covariation signal tend to have low evolutionary rates, and that it is this low rate that gives rise to the covariation signal. Slowly evolving residue pairs are disproportionately located in the protein’s core, which explains covariation methods’ ability to detect pairs of residues that are close in three dimensions. These observations lead us to propose the “coevolution paradox”: The strength of coevolution required to cause coordinated changes means the evolutionary rate is so low that such changes are highly unlikely to occur. As modern covariation methods may lead to breakthroughs in structural genomics, it is critical to recognize their biases and limitations.
Collapse
Affiliation(s)
- David Talavera
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon Whelan
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom Evolutionary Biology Centre, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| |
Collapse
|