301
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
302
|
Kinjo AR. Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families. J Theor Biol 2018; 443:18-27. [PMID: 29355538 DOI: 10.1016/j.jtbi.2018.01.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]
Abstract
In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
303
|
Abstract
[Formula: see text]-Barrel membrane proteins ([Formula: see text]MPs) play important roles, but knowledge of their structures is limited. We have developed a method to predict their 3D structures. We predict strand registers and construct transmembrane (TM) domains of [Formula: see text]MPs accurately, including proteins for which no prediction has been attempted before. Our method also accurately predicts structures from protein families with a limited number of sequences and proteins with novel folds. An average main-chain rmsd of 3.48 Å is achieved between predicted and experimentally resolved structures of TM domains, which is a significant improvement ([Formula: see text]3 Å) over a recent study. For [Formula: see text]MPs with NMR structures, the deviation between predictions and experimentally solved structures is similar to the difference among the NMR structures, indicating excellent prediction accuracy. Moreover, we can now accurately model the extended [Formula: see text]-barrels and loops in non-TM domains, increasing the overall coverage of structure prediction by [Formula: see text]%. Our method is general and can be applied to genome-wide structural prediction of [Formula: see text]MPs.
Collapse
|
304
|
Adhikari B, Cheng J. CONFOLD2: improved contact-driven ab initio protein structure modeling. BMC Bioinformatics 2018; 19:22. [PMID: 29370750 PMCID: PMC5784681 DOI: 10.1186/s12859-018-2032-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 01/17/2018] [Indexed: 12/31/2022] Open
Abstract
Background Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. Results We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. Conclusion CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/. Electronic supplementary material The online version of this article (10.1186/s12859-018-2032-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, 63121, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, MO, USA.
| |
Collapse
|
305
|
Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst 2018; 6:65-74.e3. [PMID: 29275173 PMCID: PMC5808454 DOI: 10.1016/j.cels.2017.11.014] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 10/04/2017] [Accepted: 11/22/2017] [Indexed: 12/21/2022]
Abstract
While genes are defined by sequence, in biological systems a protein's function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction.
Collapse
Affiliation(s)
- Yang Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
| | - Perry Palmedo
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA; Division of Medical Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Qing Ye
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
306
|
Abstract
Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
Collapse
|
307
|
Yin X, Yang J, Xiao F, Yang Y, Shen HB. MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction. NANO-MICRO LETTERS 2018; 10:2. [PMID: 30393651 PMCID: PMC6199043 DOI: 10.1007/s40820-017-0156-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 08/26/2017] [Indexed: 05/12/2023]
Abstract
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/.
Collapse
Affiliation(s)
- Xi Yin
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China
| | - Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China
| | - Feng Xiao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, 200240, People's Republic of China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China.
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China.
| |
Collapse
|
308
|
Prediction of Structures and Interactions from Genome Information. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:123-152. [DOI: 10.1007/978-981-13-2200-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
309
|
Huang YJ, Brock KP, Sander C, Marks DS, Montelione GT. A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:153-169. [PMID: 30617828 DOI: 10.1007/978-981-13-2200-6_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
While 3D structure determination of small (<15 kDa) proteins by solution NMR is largely automated and routine, structural analysis of larger proteins is more challenging. An emerging hybrid strategy for modeling protein structures combines sparse NMR data that can be obtained for larger proteins with sequence co-variation data, called evolutionary couplings (ECs), obtained from multiple sequence alignments of protein families. This hybrid "EC-NMR" method can be used to accurately model larger (15-60 kDa) proteins, and more rapidly determine structures of smaller (5-15 kDa) proteins using only backbone NMR data. The resulting structures have accuracies relative to reference structures comparable to those obtained with full backbone and sidechain NMR resonance assignments. The requirement that evolutionary couplings (ECs) are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, potentially also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.
Collapse
Affiliation(s)
- Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Kelly P Brock
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
310
|
Suplatov D, Sharapova Y, Timonina D, Kopylov K, Švedas V. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families. J Bioinform Comput Biol 2017; 16:1840005. [PMID: 29361894 DOI: 10.1142/s021972001840005x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
Collapse
Affiliation(s)
- Dmitry Suplatov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Yana Sharapova
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Daria Timonina
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Kirill Kopylov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Vytas Švedas
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| |
Collapse
|
311
|
Kazlauskas D, Sezonov G, Charpin N, Venclovas Č, Forterre P, Krupovic M. Novel Families of Archaeo-Eukaryotic Primases Associated with Mobile Genetic Elements of Bacteria and Archaea. J Mol Biol 2017; 430:737-750. [PMID: 29198957 PMCID: PMC5862659 DOI: 10.1016/j.jmb.2017.11.014] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Revised: 11/22/2017] [Accepted: 11/23/2017] [Indexed: 11/15/2022]
Abstract
Cellular organisms in different domains of life employ structurally unrelated, non-homologous DNA primases for synthesis of a primer for DNA replication. Archaea and eukaryotes encode enzymes of the archaeo-eukaryotic primase (AEP) superfamily, whereas bacteria uniformly use primases of the DnaG family. However, AEP genes are widespread in bacterial genomes raising questions regarding their provenance and function. Here, using an archaeal primase–polymerase PolpTN2 encoded by pTN2 plasmid as a seed for sequence similarity searches, we recovered over 800 AEP homologs from bacteria belonging to 12 highly diverse phyla. These sequences formed a supergroup, PrimPol-PV1, and could be classified into five novel AEP families which are characterized by a conserved motif containing an arginine residue likely to be involved in nucleotide binding. Functional assays confirm the essentiality of this motif for catalytic activity of the PolpTN2 primase–polymerase. Further analyses showed that bacterial AEPs display a range of domain organizations and uncovered several candidates for novel families of helicases. Furthermore, sequence and structure comparisons suggest that PriCT-1 and PriCT-2 domains frequently fused to the AEP domains are related to each other as well as to the non-catalytic, large subunit of archaeal and eukaryotic primases, and to the recently discovered PriX subunit of archaeal primases. Finally, genomic neighborhood analysis indicates that the identified AEPs encoded in bacterial genomes are nearly exclusively associated with highly diverse integrated mobile genetic elements, including integrative conjugative plasmids and prophages. Primases of the archaeo-eukaryotic primase (AEP) superfamily are widespread in bacteria. We describe five new AEP families in bacteria belonging to 12 diverse phyla. The new AEP families display a conserved signature motif likely involved in nucleotide binding. The primase domains are fused to diverse functional domains, revealing new families of putative helicases. The novel primases are encoded within highly diverse integrated mobile genetic elements.
Collapse
Affiliation(s)
- Darius Kazlauskas
- Institute of Biotechnology, Vilnius University, Saulėtekio av. 7, Vilnius 10257, Lithuania
| | - Guennadi Sezonov
- Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR 7138 Evolution Paris Seine-Institut de Biologie Paris Seine, Paris 75005, France
| | - Nicole Charpin
- Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Department of Microbiology, Institut Pasteur, 25 rue du Docteur Roux, Paris 75015, France
| | - Česlovas Venclovas
- Institute of Biotechnology, Vilnius University, Saulėtekio av. 7, Vilnius 10257, Lithuania.
| | - Patrick Forterre
- Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Department of Microbiology, Institut Pasteur, 25 rue du Docteur Roux, Paris 75015, France
| | - Mart Krupovic
- Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Department of Microbiology, Institut Pasteur, 25 rue du Docteur Roux, Paris 75015, France.
| |
Collapse
|
312
|
Hong SH, Joung I, Flores-Canales JC, Manavalan B, Cheng Q, Heo S, Kim JY, Lee SY, Nam M, Joo K, Lee IH, Lee SJ, Lee J. Protein structure modeling and refinement by global optimization in CASP12. Proteins 2017; 86 Suppl 1:122-135. [PMID: 29159837 DOI: 10.1002/prot.25426] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 11/10/2017] [Accepted: 11/16/2017] [Indexed: 11/09/2022]
Abstract
For protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment). For 3D chain building, we updated our energy function by including restraints generated from predicted residue-residue contacts. New energy terms for the predicted secondary structure and predicted solvent accessible surface area were also introduced. For difficult targets, we proposed a new method, LEEab, where the template term played a less significant role than it did in LEE, complemented by increased contributions from other terms such as the predicted contact term. For TBM (template-based modeling) targets, LEE performed better than LEEab, but for FM targets, LEEab was better. For model refinement, we modified our CASP11 molecular dynamics (MD) based protocol by using explicit solvents and tuning down restraint weights. Refinement results from MD simulations that used a new augmented statistical energy term in the force field were quite promising. Finally, when using inaccurate information (such as the predicted contacts), it was important to use the Lorentzian function for which the maximal penalty arising from wrong information is always bounded.
Collapse
Affiliation(s)
- Seung Hwan Hong
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Jose C Flores-Canales
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Balachandran Manavalan
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Qianyi Cheng
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Seungryong Heo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Sun Young Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Mikyung Nam
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, South Korea
| | - In-Ho Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,Korea Research Institute of Standards and Science (KRISS), Daejeon, South Korea
| | - Sung Jong Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,The Research Institute for Basic Sciences, Changwon National University, Changwon-Si, Gyeongsangnam-do, South Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, South Korea
| |
Collapse
|
313
|
Thomas JMH, Simkovic F, Keegan R, Mayans O, Zhang C, Zhang Y, Rigden DJ. Approaches to ab initio molecular replacement of α-helical transmembrane proteins. Acta Crystallogr D Struct Biol 2017; 73:985-996. [PMID: 29199978 PMCID: PMC5713875 DOI: 10.1107/s2059798317016436] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 11/15/2017] [Indexed: 02/06/2023] Open
Abstract
α-Helical transmembrane proteins are a ubiquitous and important class of proteins, but present difficulties for crystallographic structure solution. Here, the effectiveness of the AMPLE molecular replacement pipeline in solving α-helical transmembrane-protein structures is assessed using a small library of eight ideal helices, as well as search models derived from ab initio models generated both with and without evolutionary contact information. The ideal helices prove to be surprisingly effective at solving higher resolution structures, but ab initio-derived search models are able to solve structures that could not be solved with the ideal helices. The addition of evolutionary contact information results in a marked improvement in the modelling and makes additional solutions possible.
Collapse
Affiliation(s)
- Jens M. H. Thomas
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Ronan Keegan
- Research Complex at Harwell, STFC Rutherford Appleton Laboratory, Didcot OX11 0FA, England
| | - Olga Mayans
- Fachbereich Biologie, Universität Konstanz, D-78457 Konstanz, Germany
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, Department of Biological Chemistry, Medical School, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, Department of Biological Chemistry, Medical School, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
314
|
Meixenberger K, Yousef KP, Smith MR, Somogyi S, Fiedler S, Bartmeyer B, Hamouda O, Bannert N, von Kleist M, Kücherer C. Molecular evolution of HIV-1 integrase during the 20 years prior to the first approval of integrase inhibitors. Virol J 2017; 14:223. [PMID: 29137637 PMCID: PMC5686839 DOI: 10.1186/s12985-017-0887-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 10/31/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Detailed knowledge of the evolutionary potential of polymorphic sites in a viral protein is important for understanding the development of drug resistance in the presence of an inhibitor. We therefore set out to analyse the molecular evolution of the HIV-1 subtype B integrase at the inter-patient level in Germany during a 20-year period prior to the first introduction of integrase strand inhibitors (INSTIs). METHODS We determined 337 HIV-1 integrase subtype B sequences (amino acids 1-278) from stored plasma samples of antiretroviral treatment-naïve individuals newly diagnosed with HIV-1 between 1986 and 2006. Shannon entropy was calculated to determine the variability at each amino acid position. Time trends in the frequency of amino acid variants were identified by linear regression. Direct coupling analysis was applied to detect covarying sites. RESULTS Twenty-two time trends in the frequency of amino acid variants demonstrated either single amino acid exchanges or variation in the degree of polymorphy. Covariation was observed for 17 amino acid variants with a temporal trend. Some minor INSTI resistance mutations (T124A, V151I, K156 N, T206S, S230 N) and some INSTI-selected mutations (M50I, L101I, T122I, T124 N, T125A, M154I, G193E, V201I) were identified at overall frequencies >5%. Among these, the frequencies of L101I, T122I, and V201I increased over time, whereas the frequency of M154I decreased. Moreover, L101I, T122I, T124A, T125A, M154I, and V201I covaried with non-resistance-associated variants. CONCLUSIONS Time-trending, covarying polymorphisms indicate that long-term evolutionary changes of the HIV-1 integrase involve defined clusters of possibly structurally or functionally associated sites independent of selective pressure through INSTIs at the inter-patient level. Linkage between polymorphic resistance- and non-resistance-associated sites can impact the selection of INSTI resistance mutations in complex ways. Identification of these sites can help in improving genotypic resistance assays, resistance prediction algorithms, and the development of new integrase inhibitors.
Collapse
Affiliation(s)
| | - Kaveh Pouran Yousef
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Maureen Rebecca Smith
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Sybille Somogyi
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Stefan Fiedler
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Barbara Bartmeyer
- HIV/AIDS, STI and Blood-borne Infections, Robert Koch Institute, Berlin, Germany
| | - Osamah Hamouda
- HIV/AIDS, STI and Blood-borne Infections, Robert Koch Institute, Berlin, Germany
| | - Norbert Bannert
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Max von Kleist
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Claudia Kücherer
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
315
|
Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 2017; 86 Suppl 1:136-151. [PMID: 29082551 DOI: 10.1002/prot.25414] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/09/2017] [Accepted: 10/27/2017] [Indexed: 12/26/2022]
Abstract
We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of β-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
316
|
Schmidt M, Hamacher K. Three-body interactions improve contact prediction within direct-coupling analysis. Phys Rev E 2017; 96:052405. [PMID: 29347718 DOI: 10.1103/physreve.96.052405] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Indexed: 11/07/2022]
Abstract
The prediction of residue contacts in a protein solely from sequence information is a promising approach to computational structure prediction. Recent developments use statistical or information theoretic methods to extract contact information from a multiple sequence alignment. Despite good results, accuracy is limited due to usage of two-body interactions within a Potts model. In this paper we generalize this approach and propose a Hamiltonian with an additional three-body interaction term. We derive a mean-field approximation for inference of three-body couplings within a Potts model which is fast enough on modern computers. Finally, we show that our model has a higher accuracy in predicting residue contacts in comparison with the plain two-body-interaction model.
Collapse
Affiliation(s)
- Michael Schmidt
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| | - Kay Hamacher
- Department of Biology and Department of Computer Science and Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| |
Collapse
|
317
|
Tramontano A. The computational prediction of protein assemblies. Curr Opin Struct Biol 2017; 46:170-175. [PMID: 29102305 DOI: 10.1016/j.sbi.2017.10.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 10/04/2017] [Accepted: 10/05/2017] [Indexed: 10/18/2022]
Abstract
The function of proteins in the cell is almost always mediated by their interaction with different partners, including other proteins, nucleic acids or small organic molecules. The ability of identifying all of them is an essential step in our quest for understanding life at the molecular level. The inference of the protein complex composition and of its molecular details can also provide relevant clues for the development and the design of drugs. In this short review, I will discuss the computational aspects of the analysis and prediction of protein-protein assemblies and discuss some of the most recent developments as seen in the last Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment.
Collapse
Affiliation(s)
- Anna Tramontano
- Physics Department, Sapienza University of Rome, Piazzale Aldo Moro, 5 I-00185 Roma, Italy; Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, Piazzale Aldo Moro, 5 I-00185 Roma, Italy
| |
Collapse
|
318
|
Wozniak PP, Konopka BM, Xu J, Vriend G, Kotulska M. Forecasting residue-residue contact prediction accuracy. Bioinformatics 2017; 33:3405-3414. [PMID: 29036497 DOI: 10.1093/bioinformatics/btx416] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/22/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein. Results We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX. Availability and implementation All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/. Contact malgorzata.kotulska@pwr.edu.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - B M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, GA 6525, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
319
|
Adhikari B, Hou J, Cheng J. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning. Proteins 2017; 86 Suppl 1:84-96. [PMID: 29047157 DOI: 10.1002/prot.25405] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 09/08/2017] [Accepted: 10/16/2017] [Indexed: 12/14/2022]
Abstract
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, Missouri
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| |
Collapse
|
320
|
Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. Biochem Soc Trans 2017; 45:1253-1261. [DOI: 10.1042/bst20170063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 08/30/2017] [Accepted: 09/04/2017] [Indexed: 01/01/2023]
Abstract
Evolution leads to considerable changes in the sequence of biomolecules, while their overall structure and function remain quite conserved. The wealth of genomic sequences, the ‘Biological Big Data’, modern sequencing techniques provide allows us to investigate biomolecular evolution with unprecedented detail. Sophisticated statistical models can infer residue pair mutations resulting from spatial proximity. The introduction of predicted spatial adjacencies as constraints in biomolecular structure prediction workflows has transformed the field of protein and RNA structure prediction toward accuracies approaching the experimental resolution limit. Going beyond structure prediction, the same mathematical framework allows mimicking evolutionary fitness landscapes to infer signaling interactions, epistasis, or mutational landscapes.
Collapse
|
321
|
Motoyama T, Nakano S, Yamamoto Y, Tokiwa H, Asano Y, Ito S. Product Release Mechanism Associated with Structural Changes in Monomeric l-Threonine 3-Dehydrogenase. Biochemistry 2017; 56:5758-5770. [DOI: 10.1021/acs.biochem.7b00832] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tomoharu Motoyama
- Graduate
Division of Nutritional and Environmental Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Shogo Nakano
- Graduate
Division of Nutritional and Environmental Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
- Asano
Active Enzyme Molecule Project, ERATO, JST, 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
| | - Yuta Yamamoto
- Department
of Chemistry, Rikkyo University, Nishi-ikebukuro, Toshimaku, Tokyo 171-8501, Japan
| | - Hiroaki Tokiwa
- Department
of Chemistry, Rikkyo University, Nishi-ikebukuro, Toshimaku, Tokyo 171-8501, Japan
- Research
Center of Smart Molecules, Rikkyo University, Nishi-ikebukuro, Toshimaku, Tokyo 171-8501, Japan
| | - Yasuhisa Asano
- Biotechnology
Research Center and Department of Biotechnology, Toyama Prefectural University, 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
- Asano
Active Enzyme Molecule Project, ERATO, JST, 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
| | - Sohei Ito
- Graduate
Division of Nutritional and Environmental Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
- Asano
Active Enzyme Molecule Project, ERATO, JST, 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
| |
Collapse
|
322
|
Avagyan V, Alonso AM, Nogales FJ. Improving the Graphical Lasso Estimation for the Precision Matrix Through Roots of the Sample Covariance Matrix. J Comput Graph Stat 2017. [DOI: 10.1080/10618600.2017.1340890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Vahe Avagyan
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan, Ghent, Belgium
| | - Andrés M. Alonso
- Department of Statistics and Director of Institute Flores de Lemus, Universidad Carlos III de Madrid, Madrid, Spain
| | - Francisco J. Nogales
- Department of Statistics and UC3M-BS Institute of Financial Big Data, Universidad Carlos III de Madrid, Madrid, Spain
| |
Collapse
|
323
|
|
324
|
Drew K, Müller CL, Bonneau R, Marcotte EM. Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets. PLoS Comput Biol 2017; 13:e1005625. [PMID: 29023445 PMCID: PMC5638211 DOI: 10.1371/journal.pcbi.1005625] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/06/2017] [Indexed: 12/21/2022] Open
Abstract
Determining the three dimensional arrangement of proteins in a complex is highly beneficial for uncovering mechanistic function and interpreting genetic variation in coding genes comprising protein complexes. There are several methods for determining co-complex interactions between proteins, among them co-fractionation / mass spectrometry (CF-MS), but it remains difficult to identify directly contacting subunits within a multi-protein complex. Correlation analysis of CF-MS profiles shows promise in detecting protein complexes as a whole but is limited in its ability to infer direct physical contacts among proteins in sub-complexes. To identify direct protein-protein contacts within human protein complexes we learn a sparse conditional dependency graph from approximately 3,000 CF-MS experiments on human cell lines. We show substantial performance gains in estimating direct interactions compared to correlation analysis on a benchmark of large protein complexes with solved three-dimensional structures. We demonstrate the method’s value in determining the three dimensional arrangement of proteins by making predictions for complexes without known structure (the exocyst and tRNA multi-synthetase complex) and by establishing evidence for the structural position of a recently discovered component of the core human EKC/KEOPS complex, GON7/C14ORF142, providing a more complete 3D model of the complex. Direct contact prediction provides easily calculable additional structural information for large-scale protein complex mapping studies and should be broadly applicable across organisms as more CF-MS datasets become available. Proteins physically associate into complexes in order to carry out the essential functions of life. Knowing how proteins are physically arranged three dimensionally in these complexes provides clues towards how they work. In principle, the associations between proteins in large-scale proteomics datasets should often reflect direct physical contacts between proteins in each complex. Here, we describe a statistical method to discover which subunits within complexes directly contact each other based on their co-purification behavior in published co-fractionation mass spectrometry datasets. Within our predictions, we recover many known protein-protein contacts, serving to validate our method, as well as unknown contacts that can inform future studies of these complexes. Specifically, we observe confident contacts between subunits within the exocyst and tRNA multi-synthetase complexes, two complexes that have incomplete structural information. Using our method, we further provide structural information for a previously missing subunit of the EKC/KEOPS complex. We anticipate that this method and the associated predictions will help to better inform our understanding of the functions and structures of diverse protein complexes.
Collapse
Affiliation(s)
- Kevin Drew
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, United States of America
- * E-mail: (KD); (CLM); (EMM)
| | - Christian L. Müller
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, United States of America
- * E-mail: (KD); (CLM); (EMM)
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, United States of America
- New York University Center for Genomics and Systems Biology, New York University, New York, NY, United States of America
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, United States of America
- * E-mail: (KD); (CLM); (EMM)
| |
Collapse
|
325
|
Buchan DWA, Jones DT. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins 2017; 86 Suppl 1:78-83. [PMID: 28901583 PMCID: PMC5836854 DOI: 10.1002/prot.25379] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Revised: 08/18/2017] [Accepted: 09/10/2017] [Indexed: 12/26/2022]
Abstract
In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Department of Computer Science, University College London, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK
| |
Collapse
|
326
|
Wang S, Li Z, Yu Y, Xu J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst 2017; 5:202-211.e3. [PMID: 28957654 PMCID: PMC5637520 DOI: 10.1016/j.cels.2017.09.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/01/2017] [Accepted: 08/29/2017] [Indexed: 01/02/2023]
Abstract
Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here, we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-MPs and then predicts 3D structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs, and generates 3D models with root-mean-square deviation (RMSD) less than 4 and 5 Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation project shows that our method predicted high-resolution 3D models for two recent test MPs of 210 residues with RMSD ∼2 Å. We estimated that our method could predict correct folds for 1,345-1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Computer Science, University of Hong Kong, Hong Kong
| | - Yizhou Yu
- Department of Computer Science, University of Hong Kong, Hong Kong
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
327
|
Sabzekar M, Naghibzadeh M, Eghdami M, Aydin Z. Protein β-sheet prediction using an efficient dynamic programming algorithm. Comput Biol Chem 2017; 70:142-155. [PMID: 28881217 DOI: 10.1016/j.compbiolchem.2017.08.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 07/25/2017] [Accepted: 08/18/2017] [Indexed: 11/28/2022]
Abstract
Predicting the β-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in β-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all β-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of β-strands. Additionally, brute-force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate β-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict β-sheet structures with high number of β-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art β-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar.
Collapse
Affiliation(s)
- Mostafa Sabzekar
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahdie Eghdami
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Zafer Aydin
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
328
|
Wang S, Sun S, Xu J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 2017; 86 Suppl 1:67-77. [PMID: 28845538 DOI: 10.1002/prot.25377] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 08/18/2017] [Accepted: 08/25/2017] [Indexed: 11/08/2022]
Abstract
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Siqi Sun
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
329
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
330
|
Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics 2017; 33:2684-2690. [PMID: 28419258 PMCID: PMC5860056 DOI: 10.1093/bioinformatics/btx217] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 01/18/2017] [Accepted: 04/12/2017] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. RESULTS EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. AVAILABILITY AND IMPLEMENTATION All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ . CONTACT d.t.jones@ucl.ac.uk.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Department of Computer Science, University College London, Gower Street, London, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, UK
| |
Collapse
|
331
|
Lopez T, Dalton K, Tomlinson A, Pande V, Frydman J. An information theoretic framework reveals a tunable allosteric network in group II chaperonins. Nat Struct Mol Biol 2017; 24:726-733. [PMID: 28741612 PMCID: PMC5986071 DOI: 10.1038/nsmb.3440] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 06/22/2017] [Indexed: 12/19/2022]
Abstract
ATP-dependent allosteric regulation of the ring-shaped group II chaperonins remains ill defined, in part because their complex oligomeric topology has limited the success of structural techniques in suggesting allosteric determinants. Further, their high sequence conservation has hindered the prediction of allosteric networks using mathematical covariation approaches. Here, we develop an information theoretic strategy that is robust to residue conservation and apply it to group II chaperonins. We identify a contiguous network of covarying residues that connects all nucleotide-binding pockets within each chaperonin ring. An interfacial residue between the networks of neighboring subunits controls positive cooperativity by communicating nucleotide occupancy within each ring. Strikingly, chaperonin allostery is tunable through single mutations at this position. Naturally occurring variants at this position that double the extent of positive cooperativity are less prevalent in nature. We propose that being less cooperative than attainable allows chaperonins to support robust folding over a wider range of metabolic conditions.
Collapse
Affiliation(s)
- Tom Lopez
- Department of Biology, Stanford University, Stanford, California, USA
| | - Kevin Dalton
- Biophysics Program, Stanford University, Stanford, California, USA
| | - Anthony Tomlinson
- Department of Biology, Stanford University, Stanford, California, USA
| | - Vijay Pande
- Biophysics Program, Stanford University, Stanford, California, USA
- Department of Chemistry, Stanford University, Stanford, California, USA
| | - Judith Frydman
- Department of Biology, Stanford University, Stanford, California, USA
- Biophysics Program, Stanford University, Stanford, California, USA
| |
Collapse
|
332
|
Adhikari B, Cheng J. Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts. BMC Bioinformatics 2017; 18:380. [PMID: 28851269 PMCID: PMC5576353 DOI: 10.1186/s12859-017-1807-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Accepted: 08/22/2017] [Indexed: 11/12/2022] Open
Abstract
Background Residue-residue contacts are key features for accurate de novo protein structure prediction. For the optimal utilization of these predicted contacts in folding proteins accurately, it is important to study the challenges of reconstructing protein structures using true contacts. Because contact-guided protein modeling approach is valuable for predicting the folds of proteins that do not have structural templates, it is necessary for reconstruction studies to focus on hard-to-predict protein structures. Results Using a data set consisting of 496 structural domains released in recent CASP experiments and a dataset of 150 representative protein structures, in this work, we discuss three techniques to improve the reconstruction accuracy using true contacts – adding secondary structures, increasing contact distance thresholds, and adding non-contacts. We find that reconstruction using secondary structures and contacts can deliver accuracy higher than using full contact maps. Similarly, we demonstrate that non-contacts can improve reconstruction accuracy not only when the used non-contacts are true but also when they are predicted. On the dataset consisting of 150 proteins, we find that by simply using low ranked predicted contacts as non-contacts and adding them as additional restraints, can increase the reconstruction accuracy by 5% when the reconstructed models are evaluated using TM-score. Conclusions Our findings suggest that secondary structures are invaluable companions of contacts for accurate reconstruction. Confirming some earlier findings, we also find that larger distance thresholds are useful for folding many protein structures which cannot be folded using the standard definition of contacts. Our findings also suggest that for more accurate reconstruction using predicted contacts it is useful to predict contacts at higher distance thresholds (beyond 8 Å) and predict non-contacts. Electronic supplementary material The online version of this article (10.1186/s12859-017-1807-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St.Louis, St. Louis, MO, 63121, USA
| | - Jianlin Cheng
- Department of Electrical Engineering & Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
333
|
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 2017; 7:8826. [PMID: 28821744 PMCID: PMC5562875 DOI: 10.1038/s41598-017-08366-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 07/10/2017] [Indexed: 11/23/2022] Open
Abstract
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Collapse
|
334
|
Zhu J, Zhang H, Li SC, Wang C, Kong L, Sun S, Zheng WM, Bu D. Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts. Bioinformatics 2017; 33:3749-3757. [DOI: 10.1093/bioinformatics/btx514] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 08/09/2017] [Indexed: 01/05/2023] Open
Affiliation(s)
- Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
335
|
Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 2017; 114:9122-9127. [PMID: 28784799 DOI: 10.1073/pnas.1702664114] [Citation(s) in RCA: 129] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Residue pairs that directly coevolve in protein families are generally close in protein 3D structures. Here we study the exceptions to this general trend-directly coevolving residue pairs that are distant in protein structures-to determine the origins of evolutionary pressure on spatially distant residues and to understand the sources of error in contact-based structure prediction. Over a set of 4,000 protein families, we find that 25% of directly coevolving residue pairs are separated by more than 5 Å in protein structures and 3% by more than 15 Å. The majority (91%) of directly coevolving residue pairs in the 5-15 Å range are found to be in contact in at least one homologous structure-these exceptions arise from structural variation in the family in the region containing the residues. Thirty-five percent of the exceptions greater than 15 Å are at homo-oligomeric interfaces, 19% arise from family structural variation, and 27% are in repeat proteins likely reflecting alignment errors. Of the remaining long-range exceptions (<1% of the total number of coupled pairs), many can be attributed to close interactions in an oligomeric state. Overall, the results suggest that directly coevolving residue pairs not in repeat proteins are spatially proximal in at least one biologically relevant protein conformation within the family; we find little evidence for direct coupling between residues at spatially separated allosteric and functional sites or for increased direct coupling between residue pairs on putative allosteric pathways connecting them.
Collapse
|
336
|
Computational studies of membrane proteins: from sequence to structure to simulation. Curr Opin Struct Biol 2017; 45:133-141. [DOI: 10.1016/j.sbi.2017.04.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 04/07/2017] [Accepted: 04/07/2017] [Indexed: 11/19/2022]
|
337
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
338
|
Aledo JC. Inferring Methionine Sulfoxidation and serine Phosphorylation crosstalk from Phylogenetic analyses. BMC Evol Biol 2017; 17:171. [PMID: 28750604 PMCID: PMC5530960 DOI: 10.1186/s12862-017-1017-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 07/19/2017] [Indexed: 11/10/2022] Open
Abstract
Background The sulfoxidation of methionine residues within the phosphorylation motif of protein kinase substrates, may provide a mechanism to couple oxidative signals to changes in protein phosphorylation. Herein, we hypothesize that if the residues within a pair of phosphorylatable-sulfoxidable sites are functionally linked, then they might have been coevolving. To test this hypothesis a number of site pairs previously detected on human stress-related proteins has been subjected to analysis using eukaryote ortholog sequences and a phylogenetic approach. Results Overall, the results support the conclusion that in the eIF2α protein, serine phosphorylation at position 218 and methionine oxidation at position 222, belong to the same functional network. First, the observed data were much better fitted by Markovian models that assumed coevolution of both sites, with respect to their counterparts assuming independent evolution (p-value = 0.003). Second, this conclusion was robust with respect to the methods used to reconstruct the phylogenetic relationship between the 233 eukaryotic species analyzed. Third, the co-distribution of phosphorylatable and sulfoxidable residues at these positions showed multiple origins throughout the evolution of eukaryotes, which further supports the view of an adaptive value for this co-occurrence. Fourth, the possibility that the coevolution of these two sites might be due to structure-driven compensatory mutations was evaluated. The results suggested that factors other than those merely structural were behind the observed coevolution. Finally, the relationship detected between other modifiable site pairs from ataxin-2 (S814-M815), ataxin-2-like (S211-M215) and Pumilio homolog 1 (S124-M125), reinforce the view of a role for phosphorylation-sulfoxidation crosstalk. Conclusions For the four stress-related proteins analyzed herein, their respective pairs of PTM sites (phosphorylatable serine and sulfoxidable methionine) were found to be evolving in a correlated fashion, which suggests a relevant role for methionine sulfoxidation and serine phosphorylation crosstalk in the control of protein translation under stress conditions. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-1017-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, 29071, Málaga, Spain.
| |
Collapse
|
339
|
Abstract
Co-evolution techniques were originally conceived to assist in protein structure prediction by inferring pairs of residues that share spatial proximity. However, the functional relationships that can be extrapolated from co-evolution have also proven to be useful in a wide array of structural bioinformatics applications. These techniques are a powerful way to extract structural and functional information in a sequence-rich world.
Collapse
|
340
|
Kinjo AR. Monte Carlo simulation of a statistical mechanical model of multiple protein sequence alignment. Biophys Physicobiol 2017; 14:99-110. [PMID: 28828285 PMCID: PMC5551269 DOI: 10.2142/biophysico.14.0_99] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 06/18/2017] [Indexed: 12/01/2022] Open
Abstract
A grand canonical Monte Carlo (MC) algorithm is presented for studying the lattice gas model (LGM) of multiple protein sequence alignment, which coherently combines long-range interactions and variable-length insertions. MC simulations are used for both parameter optimization of the model and production runs to explore the sequence subspace around a given protein family. In this Note, I describe the details of the MC algorithm as well as some preliminary results of MC simulations with various temperatures and chemical potentials, and compare them with the mean-field approximation. The existence of a two-state transition in the sequence space is suggested for the SH3 domain family, and inappropriateness of the mean-field approximation for the LGM is demonstrated.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
341
|
Wang Y, Wang J, Wu S, Zhu H. The unexpected structures of hepatitis C virus envelope proteins. Exp Ther Med 2017; 14:1859-1865. [PMID: 28962094 PMCID: PMC5609170 DOI: 10.3892/etm.2017.4745] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Accepted: 11/18/2016] [Indexed: 12/13/2022] Open
Abstract
Hepatitis C virus (HCV) envelope proteins are essential not only for maintaining the viral life cycle, but also for evading the host's immune response and in clinical intervention. A thorough understanding of HCV envelope proteins depends on the availability of detailed structural information. Two crystal structures of the E2 core portion and of the E2 ectodomain, and one structure of the N-terminus of E1 ectodomain have shed new light on the complexity of HCV envelope proteins. In addition, the full-length E1-E2 complex has recently been modeled. The present review focuses on these advancements, introduces the recently solved structures and their biological implications and proposes novel ideas for studying the full-length E1-E2 complex.
Collapse
Affiliation(s)
- Yunyun Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, School of Medicine, The First Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang 310003, P.R. China
| | - Jing Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, School of Medicine, The First Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang 310003, P.R. China
| | - Shanshan Wu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, School of Medicine, The First Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang 310003, P.R. China
| | - Haihong Zhu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, School of Medicine, The First Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang 310003, P.R. China
| |
Collapse
|
342
|
Kassem MM, Wang Y, Boomsma W, Lindorff-Larsen K. Structure of the Bacterial Cytoskeleton Protein Bactofilin by NMR Chemical Shifts and Sequence Variation. Biophys J 2017; 110:2342-2348. [PMID: 27276252 DOI: 10.1016/j.bpj.2016.04.039] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 04/19/2016] [Accepted: 04/21/2016] [Indexed: 12/28/2022] Open
Abstract
Bactofilins constitute a recently discovered class of bacterial proteins that form cytoskeletal filaments. They share a highly conserved domain (DUF583) of which the structure remains unknown, in part due to the large size and noncrystalline nature of the filaments. Here, we describe the atomic structure of a bactofilin domain from Caulobacter crescentus. To determine the structure, we developed an approach that combines a biophysical model for proteins with recently obtained solid-state NMR spectroscopy data and amino acid contacts predicted from a detailed analysis of the evolutionary history of bactofilins. Our structure reveals a triangular β-helical (solenoid) conformation with conserved residues forming the tightly packed core and polar residues lining the surface. The repetitive structure explains the presence of internal repeats as well as strongly conserved positions, and is reminiscent of other fibrillar proteins. Our work provides a structural basis for future studies of bactofilin biology and for designing molecules that target them, as well as a starting point for determining the organization of the entire bactofilin filament. Finally, our approach presents new avenues for determining structures that are difficult to obtain by traditional means.
Collapse
Affiliation(s)
- Maher M Kassem
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yong Wang
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
343
|
Stahl K, Schneider M, Brock O. EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinformatics 2017; 18:303. [PMID: 28623886 PMCID: PMC5474060 DOI: 10.1186/s12859-017-1713-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/.
Collapse
Affiliation(s)
- Kolja Stahl
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| |
Collapse
|
344
|
Burnley T, Palmer CM, Winn M. Recent developments in the CCP-EM software suite. Acta Crystallogr D Struct Biol 2017; 73:469-477. [PMID: 28580908 PMCID: PMC5458488 DOI: 10.1107/s2059798317007859] [Citation(s) in RCA: 243] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 05/26/2017] [Indexed: 11/13/2023] Open
Abstract
As part of its remit to provide computational support to the cryo-EM community, the Collaborative Computational Project for Electron cryo-Microscopy (CCP-EM) has produced a software framework which enables easy access to a range of programs and utilities. The resulting software suite incorporates contributions from different collaborators by encapsulating them in Python task wrappers, which are then made accessible via a user-friendly graphical user interface as well as a command-line interface suitable for scripting. The framework includes tools for project and data management. An overview of the design of the framework is given, together with a survey of the functionality at different levels. The current CCP-EM suite has particular strength in the building and refinement of atomic models into cryo-EM reconstructions, which is described in detail.
Collapse
Affiliation(s)
- Tom Burnley
- Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, England
| | - Colin M Palmer
- Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, England
| | - Martyn Winn
- Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, England
| |
Collapse
|
345
|
Flynn WF, Haldane A, Torbett BE, Levy RM. Inference of Epistatic Effects Leading to Entrenchment and Drug Resistance in HIV-1 Protease. Mol Biol Evol 2017; 34:1291-1306. [PMID: 28369521 PMCID: PMC5435099 DOI: 10.1093/molbev/msx095] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Understanding the complex mutation patterns that give rise to drug resistant viral strains provides a foundation for developing more effective treatment strategies for HIV/AIDS. Multiple sequence alignments of drug-experienced HIV-1 protease sequences contain networks of many pair correlations which can be used to build a (Potts) Hamiltonian model of these mutation patterns. Using this Hamiltonian model, we translate HIV-1 protease sequence covariation data into quantitative predictions for the probability of observing specific mutation patterns which are in agreement with the observed sequence statistics. We find that the statistical energies of the Potts model are correlated with the fitness of individual proteins containing therapy-associated mutations as estimated by in vitro measurements of protein stability and viral infectivity. We show that the penalty for acquiring primary resistance mutations depends on the epistatic interactions with the sequence background. Primary mutations which lead to drug resistance can become highly advantageous (or entrenched) by the complex mutation patterns which arise in response to drug therapy despite being destabilizing in the wildtype background. Anticipating epistatic effects is important for the design of future protease inhibitor therapies.
Collapse
Affiliation(s)
- William F. Flynn
- Department of Physics and Astronomy, Rutgers University, New Brunswick, NJ
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| | - Bruce E. Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| |
Collapse
|
346
|
Teixeira PL, Mendenhall JL, Heinze S, Weiner B, Skwark MJ, Meiler J. Membrane protein contact and structure prediction using co-evolution in conjunction with machine learning. PLoS One 2017; 12:e0177866. [PMID: 28542325 PMCID: PMC5443516 DOI: 10.1371/journal.pone.0177866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Accepted: 05/04/2017] [Indexed: 11/18/2022] Open
Abstract
De novo membrane protein structure prediction is limited to small proteins due to the conformational search space quickly expanding with length. Long-range contacts (24+ amino acid separation)-residue positions distant in sequence, but in close proximity in the structure, are arguably the most effective way to restrict this conformational space. Inverse methods for co-evolutionary analysis predict a global set of position-pair couplings that best explain the observed amino acid co-occurrences, thus distinguishing between evolutionarily explained co-variances and these arising from spurious transitive effects. Here, we show that applying machine learning approaches and custom descriptors improves evolutionary contact prediction accuracy, resulting in improvement of average precision by 6 percentage points for the top 1L non-local contacts. Further, we demonstrate that predicted contacts improve protein folding with BCL::Fold. The mean RMSD100 metric for the top 10 models folded was reduced by an average of 2 Å for a benchmark of 25 membrane proteins.
Collapse
Affiliation(s)
- Pedro L. Teixeira
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jeff L. Mendenhall
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Sten Heinze
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Brian Weiner
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Marcin J. Skwark
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Jens Meiler
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
- * E-mail:
| |
Collapse
|
347
|
van den Bergh T, Tamo G, Nobili A, Tao Y, Tan T, Bornscheuer UT, Kuipers RKP, Vroling B, de Jong RM, Subramanian K, Schaap PJ, Desmet T, Nidetzky B, Vriend G, Joosten HJ. CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 2017; 12:e0176427. [PMID: 28545124 PMCID: PMC5436653 DOI: 10.1371/journal.pone.0176427] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022] Open
Abstract
CorNet is a web-based tool for the analysis of co-evolving residue positions in protein super-family sequence alignments. CorNet projects external information such as mutation data extracted from literature on interactively displayed groups of co-evolving residue positions to shed light on the functions associated with these groups and the residues in them. We used CorNet to analyse six enzyme super-families and found that groups of strongly co-evolving residues tend to consist of residues involved in a same function such as activity, specificity, co-factor binding, or enantioselectivity. This finding allows to assign a function to residues for which no data is available yet in the literature. A mutant library was designed to mutate residues observed in a group of co-evolving residues predicted to be involved in enantioselectivity, but for which no literature data is available yet. The resulting set of mutations indeed showed many instances of increased enantioselectivity.
Collapse
Affiliation(s)
- Tom van den Bergh
- Bio-Prodict, Nijmegen, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | | | - Alberto Nobili
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | - Yifeng Tao
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Tianwei Tan
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Uwe T. Bornscheuer
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | | | | | | | | | - Peter J. Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | - Tom Desmet
- Centre for Industrial Biotechnology and Biocatalysis, Ghent University, Ghent, Belgium
| | - Bernd Nidetzky
- Institute of Biotechnology and Biochemical Engineering, Graz University of Technology, Graz, Austria
| | | | - Henk-Jan Joosten
- Bio-Prodict, Nijmegen, The Netherlands
- CMBI, Radboudumc, Nijmegen, The Netherlands
- * E-mail:
| |
Collapse
|
348
|
Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017; 33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
349
|
Simkovic F, Ovchinnikov S, Baker D, Rigden DJ. Applications of contact predictions to structural biology. IUCRJ 2017; 4:291-300. [PMID: 28512576 PMCID: PMC5414403 DOI: 10.1107/s2052252517005115] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/03/2017] [Indexed: 06/07/2023]
Abstract
Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallo-graphy, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
350
|
Chapman SD, Adami C, Wilke CO, B Kc D. The evolution of logic circuits for the purpose of protein contact map prediction. PeerJ 2017; 5:e3139. [PMID: 28439455 PMCID: PMC5398280 DOI: 10.7717/peerj.3139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 03/02/2017] [Indexed: 11/20/2022] Open
Abstract
Predicting protein structure from sequence remains a major open problem in protein biochemistry. One component of predicting complete structures is the prediction of inter-residue contact patterns (contact maps). Here, we discuss protein contact map prediction by machine learning. We describe a novel method for contact map prediction that uses the evolution of logic circuits. These logic circuits operate on feature data and output whether or not two amino acids in a protein are in contact or not. We show that such a method is feasible, and in addition that evolution allows the logic circuits to be trained on the dataset in an unbiased manner so that it can be used in both contact map prediction and the selection of relevant features in a dataset.
Collapse
Affiliation(s)
- Samuel D Chapman
- Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Christoph Adami
- Department of Microbiology and Molecular Genetics and Department of Physics and Astronomy, Michigan State University, East Lansing, MI, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Dukka B Kc
- Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| |
Collapse
|