Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011;28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011;28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

451

Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015;11:e1004226. [PMID: 25950956 PMCID: PMC4423992 DOI: 10.1371/journal.pcbi.1004226] [Citation(s) in RCA: 882] [Impact Index Per Article: 88.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 03/02/2015] [Indexed: 11/19/2022] Open

Abstract

16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.

Collapse

452

Banach M, Prudhomme N, Carpentier M, Duprat E, Papandreou N, Kalinowska B, Chomilier J, Roterman I. Contribution to the prediction of the fold code: application to immunoglobulin and flavodoxin cases. PLoS One 2015;10:e0125098. [PMID: 25915049 PMCID: PMC4411048 DOI: 10.1371/journal.pone.0125098] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 03/20/2015] [Indexed: 12/19/2022] Open

Abstract

Background

Folding nucleus of globular proteins formation starts by the mutual interaction of a group of hydrophobic amino acids whose close contacts allow subsequent formation and stability of the 3D structure. These early steps can be predicted by simulation of the folding process through a Monte Carlo (MC) coarse grain model in a discrete space. We previously defined MIRs (Most Interacting Residues), as the set of residues presenting a large number of non-covalent neighbour interactions during such simulation. MIRs are good candidates to define the minimal number of residues giving rise to a given fold instead of another one, although their proportion is rather high, typically [15-20]% of the sequences. Having in mind experiments with two sequences of very high levels of sequence identity (up to 90%) but different folds, we combined the MIR method, which takes sequence as single input, with the “fuzzy oil drop” (FOD) model that requires a 3D structure, in order to estimate the residues coding for the fold. FOD assumes that a globular protein follows an idealised 3D Gaussian distribution of hydrophobicity density, with the maximum in the centre and minima at the surface of the “drop”. If the actual local density of hydrophobicity around a given amino acid is as high as the ideal one, then this amino acid is assigned to the core of the globular protein, and it is assumed to follow the FOD model. Therefore one obtains a distribution of the amino acids of a protein according to their agreement or rejection with the FOD model.

Results

We compared and combined MIR and FOD methods to define the minimal nucleus, or keystone, of two populated folds: immunoglobulin-like (Ig) and flavodoxins (Flav). The combination of these two approaches defines some positions both predicted as a MIR and assigned as accordant with the FOD model. It is shown here that for these two folds, the intersection of the predicted sets of residues significantly differs from random selection. It reduces the number of selected residues by each individual method and allows a reasonable agreement with experimentally determined key residues coding for the particular fold. In addition, the intersection of the two methods significantly increases the specificity of the prediction, providing a robust set of residues that constitute the folding nucleus.

Collapse

453

de Oliveira SHP, Shi J, Deane CM. Building a better fragment library for de novo protein structure prediction. PLoS One 2015;10:e0123998. [PMID: 25901595 PMCID: PMC4406757 DOI: 10.1371/journal.pone.0123998] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 02/25/2015] [Indexed: 01/11/2023] Open

454

Li F, Liu J, Garavito RM, Ferguson-Miller S. Evolving understanding of translocator protein 18 kDa (TSPO). Pharmacol Res 2015;99:404-9. [PMID: 25882248 DOI: 10.1016/j.phrs.2015.03.022] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 03/25/2015] [Accepted: 03/27/2015] [Indexed: 02/01/2023]

455

Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015;44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 258] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]

Abstract

The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.

Collapse

456

Ochoa D, Juan D, Valencia A, Pazos F. Detection of significant protein coevolution. ACTA ACUST UNITED AC 2015;31:2166-73. [PMID: 25717190 DOI: 10.1093/bioinformatics/btv102] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 02/11/2015] [Indexed: 11/14/2022]

457

Mao W, Kaya C, Dutta A, Horovitz A, Bahar I. Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution. Bioinformatics 2015;31:1929-37. [PMID: 25697822 PMCID: PMC4481699 DOI: 10.1093/bioinformatics/btv103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2014] [Accepted: 02/02/2015] [Indexed: 01/02/2023] Open

Affiliation(s)

Wenzhi Mao Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
Cihan Kaya Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
Anindita Dutta Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
Amnon Horovitz Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
Ivet Bahar Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel

Collapse

458

Soltan Ghoraie L, Burkowski F, Zhu M. Sparse networks of directly coupled, polymorphic, and functional side chains in allosteric proteins. Proteins 2015;83:497-516. [DOI: 10.1002/prot.24752] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2014] [Revised: 12/05/2014] [Accepted: 12/13/2014] [Indexed: 02/05/2023]

459

Sun HP, Huang Y, Wang XF, Zhang Y, Shen HB. Improving accuracy of protein contact prediction using balanced network deconvolution. Proteins 2015;83:485-96. [PMID: 25524593 DOI: 10.1002/prot.24744] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 11/20/2014] [Accepted: 12/02/2014] [Indexed: 12/28/2022]

460

Andreani J, Söding J. bbcontacts: prediction of β-strand pairing from direct coupling patterns. ACTA ACUST UNITED AC 2015;31:1729-37. [PMID: 25618863 DOI: 10.1093/bioinformatics/btv041] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 01/17/2015] [Indexed: 01/08/2023]

461

Li G, Theys K, Verheyen J, Pineda-Peña AC, Khouri R, Piampongsant S, Eusébio M, Ramon J, Vandamme AM. A new ensemble coevolution system for detecting HIV-1 protein coevolution. Biol Direct 2015;10:1. [PMID: 25564011 PMCID: PMC4332441 DOI: 10.1186/s13062-014-0031-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/31/2022] Open

Abstract

BACKGROUND

A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed.

RESULTS

We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors.

CONCLUSIONS

This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.

Collapse

462

Nugent T. De novo membrane protein structure prediction. Methods Mol Biol 2015;1215:331-50. [PMID: 25330970 DOI: 10.1007/978-1-4939-1465-4_15] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

463

Tian P, Boomsma W, Wang Y, Otzen DE, Jensen MH, Lindorff-Larsen K. Structure of a Functional Amyloid Protein Subunit Computed Using Sequence Variation. J Am Chem Soc 2014;137:22-5. [DOI: 10.1021/ja5093634] [Citation(s) in RCA: 82] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

464

Raimondi D, Orlando G, Vranken WF. Clustering-based model of cysteine co-evolution improves disulfide bond connectivity prediction and reduces homologous sequence requirements. Bioinformatics 2014;31:1219-25. [DOI: 10.1093/bioinformatics/btu794] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 11/18/2014] [Indexed: 12/23/2022] Open

465

Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. ACTA ACUST UNITED AC 2014;31:999-1006. [PMID: 25431331 PMCID: PMC4382908 DOI: 10.1093/bioinformatics/btu791] [Citation(s) in RCA: 237] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 11/22/2014] [Indexed: 12/13/2022]

Abstract

Motivation: Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues.

Results: Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts—around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV.

Availability and implementation: MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV.

Contact:d.t.jones@ucl.ac.uk

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

466

Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput Biol 2014;10:e1003889. [PMID: 25375897 PMCID: PMC4222596 DOI: 10.1371/journal.pcbi.1003889] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 09/03/2014] [Indexed: 11/23/2022] Open

Abstract

Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Here, we introduce a novel protein contact prediction method PconsC2 that, to the best of our knowledge, outperforms earlier methods. PconsC2 is based on our earlier method, PconsC, as it utilizes the same set of contact predictions from plmDCA and PSICOV. However, in contrast to PconsC, where each residue pair is analysed independently, the initial predictions are analysed in context of neighbouring residue pairs using a deep learning approach, inspired by earlier work. We find that for each layer the deep learning procedure improves the predictions. At the end, after five layers of deep learning and inclusion of a few extra features provides the best performance. An improvement can be seen for all types of proteins, independent on length, number of homologous sequences and structural class. However, the improvement is largest for β-sheet containing proteins. Most importantly the improvement brings for the first time sufficiently accurate predictions to some protein families with less than 1000 homologous sequences. PconsC2 outperforms as well state of the art machine learning based predictors for protein families larger than 100 effective sequences. PconsC2 is licensed under the GNU General Public License v3 and freely available from http://c2.pcons.net/.

Collapse

467

Touw WG, Baakman C, Black J, te Beek TAH, Krieger E, Joosten RP, Vriend G. A series of PDB-related databanks for everyday needs. Nucleic Acids Res 2014;43:D364-8. [PMID: 25352545 PMCID: PMC4383885 DOI: 10.1093/nar/gku1028] [Citation(s) in RCA: 676] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

468

Hinsen K, Vaitinadapoule A, Ostuni MA, Etchebest C, Lacapere JJ. Construction and validation of an atomic model for bacterial TSPO from electron microscopy density, evolutionary constraints, and biochemical and biophysical data. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2014;1848:568-80. [PMID: 25450341 DOI: 10.1016/j.bbamem.2014.10.028] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 10/01/2014] [Accepted: 10/20/2014] [Indexed: 11/30/2022]

469

Schneider M, Brock O. Combining physicochemical and evolutionary information for protein contact prediction. PLoS One 2014;9:e108438. [PMID: 25338092 PMCID: PMC4206277 DOI: 10.1371/journal.pone.0108438] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 07/28/2014] [Indexed: 11/18/2022] Open

470

Campeotto I, Percy MG, MacDonald JT, Förster A, Freemont PS, Gründling A. Structural and mechanistic insight into the Listeria monocytogenes two-enzyme lipoteichoic acid synthesis system. J Biol Chem 2014;289:28054-69. [PMID: 25128528 PMCID: PMC4192460 DOI: 10.1074/jbc.m114.590570] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 08/12/2014] [Indexed: 11/07/2022] Open

471

Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLoS Comput Biol 2014;10:e1003847. [PMID: 25299132 PMCID: PMC4191875 DOI: 10.1371/journal.pcbi.1003847] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 08/07/2014] [Indexed: 11/18/2022] Open

Abstract

Correlation patterns in multiple sequence alignments of homologous proteins can be exploited to infer information on the three-dimensional structure of their members. The typical pipeline to address this task, which we in this paper refer to as the three dimensions of contact prediction, is to (i) filter and align the raw sequence data representing the evolutionarily related proteins; (ii) choose a predictive model to describe a sequence alignment; (iii) infer the model parameters and interpret them in terms of structural properties, such as an accurate contact map. We show here that all three dimensions are important for overall prediction success. In particular, we show that it is possible to improve significantly along the second dimension by going beyond the pair-wise Potts models from statistical physics, which have hitherto been the focus of the field. These (simple) extensions are motivated by multiple sequence alignments often containing long stretches of gaps which, as a data feature, would be rather untypical for independent samples drawn from a Potts model. Using a large test set of proteins we show that the combined improvements along the three dimensions are as large as any reported to date.

Proteins are large molecules that living cells make by stringing together building blocks called amino acids or peptides, following their blue-prints in the DNA. Freshly made proteins are typically long, structure-less chains of peptides, but shortly afterwards most of them fold into characteristic structures. Proteins execute many functions in the cell, for which they need to have the right structure, which is therefore very important in determining what the proteins can do. The structure of a protein can be determined by X-ray diffraction and other experimental approaches which are all, to this day, somewhat labor-intensive and difficult. On the other hand, the order of the peptides in a protein can be read off from the DNA blue-print, and such protein sequences are today routinely produced in large numbers. In this paper we show that many similar protein sequences can be used to find information about the structure. The basic approach is to construct a probabilistic model for sequence variability, and then to use the parameters of that model to predict structure in three-dimensional space. The main technical novelty compared to previous contributions in the same general direction is that we use models more directly matched to the data.

Collapse

472

Shahmoradi A, Sydykova DK, Spielman SJ, Jackson EL, Dawson ET, Meyer AG, Wilke CO. Predicting evolutionary site variability from structure in viral proteins: buriedness, packing, flexibility, and design. J Mol Evol 2014;79:130-42. [PMID: 25217382 PMCID: PMC4216736 DOI: 10.1007/s00239-014-9644-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 08/31/2014] [Indexed: 12/27/2022]

Abstract

Several recent works have shown that protein structure can predict site-specific evolutionary sequence variation. In particular, sites that are buried and/or have many contacts with other sites in a structure have been shown to evolve more slowly, on average, than surface sites with few contacts. Here, we present a comprehensive study of the extent to which numerous structural properties can predict sequence variation. The quantities we considered include buriedness (as measured by relative solvent accessibility), packing density (as measured by contact number), structural flexibility (as measured by B factors, root-mean-square fluctuations, and variation in dihedral angles), and variability in designed structures. We obtained structural flexibility measures both from molecular dynamics simulations performed on nine non-homologous viral protein structures and from variation in homologous variants of those proteins, where they were available. We obtained measures of variability in designed structures from flexible-backbone design in the Rosetta software. We found that most of the structural properties correlate with site variation in the majority of structures, though the correlations are generally weak (correlation coefficients of 0.1-0.4). Moreover, we found that buriedness and packing density were better predictors of evolutionary variation than structural flexibility. Finally, variability in designed structures was a weaker predictor of evolutionary variability than buriedness or packing density, but it was comparable in its predictive power to the best structural flexibility measures. We conclude that simple measures of buriedness and packing density are better predictors of evolutionary variation than the more complicated predictors obtained from dynamic simulations, ensembles of homologous structures, or computational protein design.

Collapse

473

Hopf TA, Schärfe CPI, Rodrigues JPGLM, Green AG, Kohlbacher O, Sander C, Bonvin AMJJ, Marks DS. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 2014;3. [PMID: 25255213 PMCID: PMC4360534 DOI: 10.7554/elife.03430] [Citation(s) in RCA: 351] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/23/2014] [Indexed: 12/24/2022] Open

474

Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014;30:i482-8. [PMID: 25161237 PMCID: PMC4147911 DOI: 10.1093/bioinformatics/btu458] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Affiliation(s)

Mirco Michel Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Sikander Hayat Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Marcin J Skwark Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Chris Sander Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Debora S Marks Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Arne Elofsson Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA

Collapse

475

HCV E2 core structures and mAbs: something is still missing. Drug Discov Today 2014;19:1964-70. [PMID: 25172800 DOI: 10.1016/j.drudis.2014.08.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 07/17/2014] [Accepted: 08/21/2014] [Indexed: 02/07/2023]

476

Seemayer S, Gruber M, Söding J. CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations. ACTA ACUST UNITED AC 2014;30:3128-30. [PMID: 25064567 PMCID: PMC4201158 DOI: 10.1093/bioinformatics/btu500] [Citation(s) in RCA: 281] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

477

Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014;554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]

478

Ivankov DN, Finkelstein AV, Kondrashov FA. A structural perspective of compensatory evolution. Curr Opin Struct Biol 2014;26:104-12. [PMID: 24981969 PMCID: PMC4141909 DOI: 10.1016/j.sbi.2014.05.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/11/2014] [Accepted: 05/16/2014] [Indexed: 11/25/2022]

479

Gotoh O, Morita M, Nelson DR. Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment. BMC Bioinformatics 2014;15:189. [PMID: 24927652 PMCID: PMC4065584 DOI: 10.1186/1471-2105-15-189] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 06/09/2014] [Indexed: 03/29/2024] Open

480

Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014;15:157. [PMID: 24886131 PMCID: PMC4046016 DOI: 10.1186/1471-2105-15-157] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 05/06/2014] [Indexed: 11/10/2022] Open

Abstract

Background

Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties.

Results

In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles.

Conclusions

Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods.

Collapse

481

General IJ, Liu Y, Blackburn ME, Mao W, Gierasch LM, Bahar I. ATPase subdomain IA is a mediator of interdomain allostery in Hsp70 molecular chaperones. PLoS Comput Biol 2014;10:e1003624. [PMID: 24831085 PMCID: PMC4022485 DOI: 10.1371/journal.pcbi.1003624] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 03/31/2014] [Indexed: 11/18/2022] Open

Abstract

The versatile functions of the heat shock protein 70 (Hsp70) family of molecular chaperones rely on allosteric interactions between their nucleotide-binding and substrate-binding domains, NBD and SBD. Understanding the mechanism of interdomain allostery is essential to rational design of Hsp70 modulators. Yet, despite significant progress in recent years, how the two Hsp70 domains regulate each other's activity remains elusive. Covariance data from experiments and computations emerged in recent years as valuable sources of information towards gaining insights into the molecular events that mediate allostery. In the present study, conservation and covariance properties derived from both sequence and structural dynamics data are integrated with results from Perturbation Response Scanning and in vivo functional assays, so as to establish the dynamical basis of interdomain signal transduction in Hsp70s. Our study highlights the critical roles of SBD residues D481 and T417 in mediating the coupled motions of the two domains, as well as that of G506 in enabling the movements of the α-helical lid with respect to the β-sandwich. It also draws attention to the distinctive role of the NBD subdomains: Subdomain IA acts as a key mediator of signal transduction between the ATP- and substrate-binding sites, this function being achieved by a cascade of interactions predominantly involving conserved residues such as V139, D148, R167 and K155. Subdomain IIA, on the other hand, is distinguished by strong coevolutionary signals (with the SBD) exhibited by a series of residues (D211, E217, L219, T383) implicated in DnaJ recognition. The occurrence of coevolving residues at the DnaJ recognition region parallels the behavior recently observed at the nucleotide-exchange-factor recognition region of subdomain IIB. These findings suggest that Hsp70 tends to adapt to co-chaperone recognition and activity via coevolving residues, whereas interdomain allostery, critical to chaperoning, is robustly enabled by conserved interactions.

Collapse

482

Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 2014;3:e02030. [PMID: 24842992 PMCID: PMC4034769 DOI: 10.7554/elife.02030] [Citation(s) in RCA: 461] [Impact Index Per Article: 41.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein–protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter.

DOI:http://dx.doi.org/10.7554/eLife.02030.001

Proteins are considered the ‘workhorse molecules’ of life and they are involved in virtually everything that cells do. Proteins are strings of amino acids that have folded into a specific three-dimensional shape. Proteins must have the correct shape to function properly, as they often work by binding to other proteins or molecules—much like a key fitting into a lock. Working out the structure of a protein can, therefore, provide major insights into how the protein does its job.

Two or more proteins can bind together and form a complex to perform various tasks; and solving the structures of these complexes can be challenging, even if the structures of the protein subunits are known. Now, Ovchinnikov, Kamisetty, and Baker have developed a method for predicting which parts of the proteins make contact with each other in a two-protein complex.

Different species can have copies of the same proteins; but a copy from one species might have different amino acids at certain positions when compared to a related copy from another species. As such, when pairs of interacting proteins from different species are compared, there will be many positions in the two proteins that vary. However, if the amino acid at a position in one protein (let's call it ‘X’) varies, and the amino acid at, say, position ‘Y’ in the other protein also varies such that for any given amino acid at position Y there is often a specific amino acid at position X; positions X and Y are said to ‘co-vary’. Ovchinnikov et al. noticed that when a pair of amino acids (one from each protein in a two-protein complex) co-varied, these two amino acids tended to make contact with each other at the protein–protein interface.

Ovchinnikov et al. used the new method to make predictions about the protein–protein interfaces in 28 protein complexes found in bacteria, and also to make a prediction about the interface between protein subunits in the bacterial ribosome. When these predictions were checked against the actual structures, which were all known beforehand, they were found to be accurate if the number of copies of each protein being compared is greater than the average length of the two proteins.

Ovchinnikov et al. went on to predict the amino acids on the protein–protein interfaces for another 36 bacterial protein complexes with unknown structures, and to present models for four larger complexes. The next challenge is to extend the method to protein complexes that are found only in eukaryotes (i.e., not in bacteria). Since the number of related copies for eukaryotic proteins tends to be smaller, there are fewer proteins to compare and it is therefore harder to detect ‘covariation’ when it occurs.

DOI:http://dx.doi.org/10.7554/eLife.02030.002

Collapse

483

Janda JO, Popal A, Bauer J, Busch M, Klocke M, Spitzer W, Keller J, Merkl R. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 2014;15:118. [PMID: 24766829 PMCID: PMC4021312 DOI: 10.1186/1471-2105-15-118] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to.

RESULTS

To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de.

CONCLUSIONS

Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.

Collapse

484

Gültas M, Düzgün G, Herzog S, Jäger SJ, Meckbach C, Wingender E, Waack S. Quantum coupled mutation finder: predicting functionally or structurally important sites in proteins using quantum Jensen-Shannon divergence and CUDA programming. BMC Bioinformatics 2014;15:96. [PMID: 24694117 PMCID: PMC4098773 DOI: 10.1186/1471-2105-15-96] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 03/26/2014] [Indexed: 11/29/2022] Open

Abstract

Background

The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites.

Results

The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs.

Conclusions

QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF’s algorithm, we leveraged Compute Unified Device Architecture (CUDA).

The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.

Collapse

485

Konopka BM, Ciombor M, Kurczynska M, Kotulska M. Automated procedure for contact-map-based protein structure reconstruction. J Membr Biol 2014;247:409-20. [PMID: 24682239 PMCID: PMC3983884 DOI: 10.1007/s00232-014-9648-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Accepted: 03/04/2014] [Indexed: 11/25/2022]

Abstract

Knowledge of the three-dimensional structures of ion channels allows for modeling their conductivity characteristics using biophysical models and can lead to discovering their cellular functionality. Recent studies show that quality of structure predictions can be significantly improved using protein contact site information. Therefore, a number of procedures for protein structure prediction based on their contact-map have been proposed. Their comparison is difficult due to different methodologies used for validation. In this work, a Contact Map-to-Structure pipeline (C2S_pipeline) for contact-based protein structure reconstruction is designed and validated. The C2S_pipeline can be used to reconstruct monomeric and multimeric proteins. The median RMSD of structures obtained during validation on a representative set of protein structures, equaled 5.27 Å, and the best structure was reconstructed with RMSD of 1.59 Å. The validation is followed by a detailed case study on the KcsA ion channel. Models of KcsA are reconstructed based on different portions of contact site information. Structural feature analysis of acquired KcsA models is supported by a thorough analysis of electrostatic potential distributions inside the channels. The study shows that electrostatic parameters are correlated with structural quality of models. Therefore, they can be used to discriminate between high and low quality structures. We show that 30 % of contact information is needed to obtain accurate structures of KcsA, if contacts are selected randomly. This number increases to 70 % in case of erroneous maps in which the remaining contacts or non-contacts are changed to the opposite. Furthermore, the study reveals that local reconstruction accuracy is correlated with the number of contacts in which amino acid are involved. This results in higher reconstruction accuracy in the structure core than peripheral regions.

Collapse

486

Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol 2014;10:e1003500. [PMID: 24675572 PMCID: PMC3967925 DOI: 10.1371/journal.pcbi.1003500] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2013] [Accepted: 01/08/2014] [Indexed: 11/24/2022] Open

Abstract

Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.

Sequence-based protein homology detection has been extensively studied, but it remains very challenging for remote homologs with divergent sequences. So far the most sensitive methods employ HMM-HMM comparison, which models a protein family using HMM (Hidden Markov Model) and then detects homologs using HMM-HMM alignment. HMM cannot model long-range residue interaction patterns and thus, carries very little information regarding the global 3D structure of a protein family. As such, HMM comparison is not sensitive enough for distantly-related homologs. In this paper, we present an MRF-MRF comparison method for homology detection. In particular, we model a protein family using Markov Random Fields (MRF) and then detect homologs by MRF-MRF alignment. Compared to HMM, MRFs are able to model long-range residue interaction pattern and thus, contains information for the overall 3D structure of a protein family. Consequently, MRF-MRF comparison is much more sensitive than HMM-HMM comparison. To implement MRF-MRF comparison, we have developed a new scoring function to measure the similarity of two MRFs and also an efficient ADMM algorithm to optimize the scoring function. Experiments confirm that MRF-MRF comparison indeed outperforms HMM-HMM comparison in terms of both alignment accuracy and remote homology detection, especially for mainly beta proteins.

Collapse

487

Kaján L, Hopf TA, Kalaš M, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 2014;15:85. [PMID: 24669753 PMCID: PMC3987048 DOI: 10.1186/1471-2105-15-85] [Citation(s) in RCA: 128] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 03/18/2014] [Indexed: 11/10/2022] Open

488

Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, Pagnani A. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014;9:e92721. [PMID: 24663061 PMCID: PMC3963956 DOI: 10.1371/journal.pone.0092721] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 02/24/2014] [Indexed: 11/18/2022] Open

489

Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014;9:e92197. [PMID: 24637808 PMCID: PMC3956894 DOI: 10.1371/journal.pone.0092197] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 02/19/2014] [Indexed: 12/21/2022] Open

Abstract

The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.

Collapse

490

Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 2014;16:6496-507. [PMID: 24603809 DOI: 10.1039/c3cp55275f] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Abstract

Understanding protein folding and function is one of the most important problems in biological research. Energy landscape theory and the folding funnel concept have provided a framework to investigate the mechanisms associated to these processes. Since protein energy landscapes are in most cases minimally frustrated, structure based models (SMBs) have successfully determined the geometrical features associated with folding and functional transitions. However, structural information is limited, particularly with respect to different functional configurations. This is a major limitation for SBMs. Alternatively, statistical methods to study amino acid co-evolution provide information on residue-residue interactions useful for the study of structure and function. Here, we show how the combination of these two methods gives rise to a novel way to investigate the mechanisms associated with folding and function. We use this methodology to explore the mechanistic aspects of protein translocation in the integral membrane protease FtsH. Dual basin-SBM simulations using the open and closed state of this hexameric motor reveals a functionally important paddling motion in the catalytic cycle. We also find that Direct Coupling Analysis (DCA) predicts physical contacts between AAA and peptidase domains of the motor, which are crucial for the open to close transition. Our combined method, which uses structural information from the open state experimental structure and co-evolutionary couplings, suggests that this methodology can be used to explore the functional landscape of complex biological macromolecules previously inaccessible to methods dependent on experimental structural information. This efficient way to sample the conformational space of large systems creates a theoretical/computational framework capable of better characterizing the functional landscape in large biomolecular assemblies.

Collapse

491

Reconstructing protein structures by neural network pairwise interaction fields and iterative decoy set construction. Biomolecules 2014;4:160-80. [PMID: 24970210 PMCID: PMC4030983 DOI: 10.3390/biom4010160] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Revised: 01/22/2014] [Accepted: 01/30/2014] [Indexed: 11/17/2022] Open

492

Kell DB, Goodacre R. Metabolomics and systems pharmacology: why and how to model the human metabolic network for drug discovery. Drug Discov Today 2014;19:171-82. [PMID: 23892182 PMCID: PMC3989035 DOI: 10.1016/j.drudis.2013.07.014] [Citation(s) in RCA: 111] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Revised: 07/03/2013] [Accepted: 07/16/2013] [Indexed: 02/06/2023]

493

Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 2014;111:E563-71. [PMID: 24449878 DOI: 10.1073/pnas.1323734111] [Citation(s) in RCA: 94] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

494

Tetchner S, Kosciolek T, Jones DT. Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction. BIO-ALGORITHMS AND MED-SYSTEMS 2014. [DOI: 10.1515/bams-2014-0013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

495

Kukic P, Mirabello C, Tradigo G, Walsh I, Veltri P, Pollastri G. Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks. BMC Bioinformatics 2014;15:6. [PMID: 24410833 PMCID: PMC3893389 DOI: 10.1186/1471-2105-15-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 12/20/2013] [Indexed: 11/21/2022] Open

Abstract

Background

Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure.

In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past.

Results

We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å.

After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å.

Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server.

Conclusions

The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints.

Collapse

496

Morcos F, Hwa T, Onuchic JN, Weigt M. Direct coupling analysis for protein contact prediction. Methods Mol Biol 2014;1137:55-70. [PMID: 24573474 DOI: 10.1007/978-1-4939-0366-5_5] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

497

Kamisetty H, Ghosh B, Langmead CJ, Bailey-Kellogg C. Learning Sequence Determinants of Protein:protein Interaction Specificity with Sparse Graphical Models. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2014;8394:129-143. [PMID: 25414914 DOI: 10.1007/978-3-319-05269-4_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

498

Terashi G, Nakamura Y, Shimoyama H, Takeda-Shitaka M. Quality Assessment Methods for 3D Protein Structure Models Based on a Residue–Residue Distance Matrix Prediction. Chem Pharm Bull (Tokyo) 2014;62:744-53. [PMID: 25087626 DOI: 10.1248/cpb.c13-00973] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

499

Probabilistic grammatical model for helix-helix contact site classification. Algorithms Mol Biol 2013;8:31. [PMID: 24350601 PMCID: PMC3892132 DOI: 10.1186/1748-7188-8-31] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Accepted: 11/28/2013] [Indexed: 11/25/2022] Open

500

Wang Z, Xu J. Predicting protein contact map using evolutionary and physical constraints by integer programming. Bioinformatics 2013;29:i266-73. [PMID: 23812992 PMCID: PMC3694661 DOI: 10.1093/bioinformatics/btt211] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open