501
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
502
|
Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 2013; 110:20533-8. [PMID: 24297889 DOI: 10.1073/pnas.1315625110] [Citation(s) in RCA: 135] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A long-standing problem in molecular biology is the determination of a complete functional conformational landscape of proteins. This includes not only proteins' native structures, but also all their respective functional states, including functionally important intermediates. Here, we reveal a signature of functionally important states in several protein families, using direct coupling analysis, which detects residue pair coevolution of protein sequence composition. This signature is exploited in a protein structure-based model to uncover conformational diversity, including hidden functional configurations. We uncovered, with high resolution (mean ~1.9 Å rmsd for nonapo structures), different functional structural states for medium to large proteins (200-450 aa) belonging to several distinct families. The combination of direct coupling analysis and the structure-based model also predicts several intermediates or hidden states that are of functional importance. This enhanced sampling is broadly applicable and has direct implications in protein structure determination and the design of ligands or drugs to trap intermediate states.
Collapse
|
503
|
Khoury GA, Smadbeck J, Kieslich CA, Floudas CA. Protein folding and de novo protein design for biotechnological applications. Trends Biotechnol 2013; 32:99-109. [PMID: 24268901 DOI: 10.1016/j.tibtech.2013.10.008] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Revised: 10/10/2013] [Accepted: 10/18/2013] [Indexed: 11/19/2022]
Abstract
In the postgenomic era, the medical/biological fields are advancing faster than ever. However, before the power of full-genome sequencing can be fully realized, the connection between amino acid sequence and protein structure, known as the protein folding problem, needs to be elucidated. The protein folding problem remains elusive, with significant difficulties still arising when modeling amino acid sequences lacking an identifiable template. Understanding protein folding will allow for unforeseen advances in protein design; often referred to as the inverse protein folding problem. Despite challenges in protein folding, de novo protein design has recently demonstrated significant success via computational techniques. We review advances and challenges in protein structure prediction and de novo protein design, and highlight their interplay in successful biotechnological applications.
Collapse
Affiliation(s)
- George A Khoury
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Chris A Kieslich
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Christodoulos A Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
504
|
Abergel C. Molecular replacement: tricks and treats. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:2167-73. [PMID: 24189227 PMCID: PMC3817689 DOI: 10.1107/s0907444913015291] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Accepted: 06/02/2013] [Indexed: 11/16/2022]
Abstract
Molecular replacement is the method of choice for X-ray crystallographic structure determination provided that suitable structural homologues are available in the PDB. Presently, there are ~80,000 structures in the PDB (8074 were deposited in the year 2012 alone), of which ~70% have been solved by molecular replacement. For successful molecular replacement the model must cover at least 50% of the total structure and the Cα r.m.s.d. between the core model and the structure to be solved must be less than 2 Å. Here, an approach originally implemented in the CaspR server (http://www.igs.cnrs-mrs.fr/Caspr2/index.cgi) based on homology modelling to search for a molecular-replacement solution is discussed. How the use of as much information as possible from different sources can improve the model(s) is briefly described. The combination of structural information with distantly related sequences is crucial to optimize the multiple alignment that will define the boundaries of the core domains. PDB clusters (sequences with ≥30% identical residues) can also provide information on the eventual changes in conformation and will help to explore the relative orientations assumed by protein subdomains. Normal-mode analysis can also help in generating series of conformational models in the search for a molecular-replacement solution. Of course, finding a correct solution is only the first step and the accuracy of the identified solution is as important as the data quality to proceed through refinement. Here, some possible reasons for failure are discussed and solutions are proposed using a set of successful examples.
Collapse
Affiliation(s)
- Chantal Abergel
- Information Génomique et Structurale, IGS UMR 7256, CNRS, Aix-Marseille Université, IMM, FR3479, 163 Avenue de Luminy – case 934, 13288 Marseille CEDEX 09, France
| |
Collapse
|
505
|
Valentin JB, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J, Frellsen J, Mardia KV, Tian P, Hamelryck T. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 2013; 82:288-99. [PMID: 23934827 DOI: 10.1002/prot.24386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 07/02/2013] [Accepted: 07/18/2013] [Indexed: 01/10/2023]
Abstract
We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.
Collapse
Affiliation(s)
- Jan B Valentin
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
506
|
Eickholt J, Cheng J. A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks. BMC Bioinformatics 2013; 14 Suppl 14:S12. [PMID: 24267585 PMCID: PMC3850995 DOI: 10.1186/1471-2105-14-s14-s12] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, the use and importance of predicted protein residue-residue contacts has grown considerably with demonstrated applications such as drug design, protein tertiary structure prediction and model quality assessment. Nevertheless, reported accuracies in the range of 25-35% stubbornly remain the norm for sequence based, long range contact predictions on hard targets. This is in spite of a prolonged effort on behalf of the community to improve the performance of residue-residue contact prediction. A thorough study of the quality of current residue-residue contact predictions and the evaluation metrics used as well as an analysis of current methods is needed to stimulate further advancement in contact prediction and its application. Such a study will better explain the quality and nature of residue-residue contact predictions generated by current methods and as a result lead to better use of this contact information. RESULTS We evaluated several sequence based residue-residue contact predictors that participated in the tenth Critical Assessment of protein Structure Prediction (CASP) experiment. The evaluation was performed using standard assessment techniques such as those used by the official CASP assessors as well as two novel evaluation metrics (i.e., cluster accuracy and cluster count). An in-depth analysis revealed that while most residue-residue contact predictions generated are not accurate at the residue level, there is quite a strong contact signal present when allowing for less than residue level precision. Our residue-residue contact predictor, DNcon, performed particularly well achieving an accuracy of 66% for the top L/10 long range contacts when evaluated in a neighbourhood of size 2. The coverage of residue-residue contact areas was also greater with DNcon when compared to other methods. We also provide an analysis of DNcon with respect to its underlying architecture and features used for classification. CONCLUSIONS Our novel evaluation metrics demonstrate that current residue-residue contact predictions do contain a strong contact signal and are of better quality than standard evaluation metrics indicate. Our method, DNcon, is a robust, state-of-the-art residue-residue sequence based contact predictor and excelled under a number of evaluation schemes. It is available as a web service at http://iris.rnet.missouri.edu/dncon/.
Collapse
|
507
|
Savojardo C, Fariselli P, Martelli PL, Casadio R. BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming. ACTA ACUST UNITED AC 2013; 29:3151-7. [PMID: 24064422 DOI: 10.1093/bioinformatics/btt555] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
MOTIVATION Prediction of protein residue contacts, even at the coarse-grain level, can help in finding solutions to the protein structure prediction problem. Unlike α-helices that are locally stabilized, β-sheets result from pairwise hydrogen bonding of two or more disjoint regions of the protein backbone. The problem of predicting contacts among β-strands in proteins has been addressed by several supervised computational approaches. Recently, prediction of residue contacts based on correlated mutations has been greatly improved and finally allows the prediction of 3D structures of the proteins. RESULTS In this article, we describe BCov, which is the first unsupervised method to predict the β-sheet topology starting from the protein sequence and its secondary structure. BCov takes advantage of the sparse inverse covariance estimation to define β-strand partner scores. Then an optimization based on integer programming is carried out to predict the β-sheet connectivity. When tested on the prediction of β-strand pairing, BCov scores with average values of Matthews Correlation Coefficient (MCC) and F1 equal to 0.56 and 0.61, respectively, on a non-redundant dataset of 916 protein chains known with atomic resolution. Our approach well compares with the state-of-the-art methods trained so far for this specific task. AVAILABILITY AND IMPLEMENTATION The method is freely available under General Public License at http://biocomp.unibo.it/savojard/bcov/bcov-1.0.tar.gz. The new dataset BetaSheet1452 can be downloaded at http://biocomp.unibo.it/savojard/bcov/BetaSheet1452.dat.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, CIRI-Health Science and Technology/Department of Biology, University of Bologna, 40126 Bologna, Italy and Department of Computer Science and Engineering, Via Mura Anteo Zamboni 7, 40127 Bologna, Italy
| | | | | | | |
Collapse
|
508
|
Nugent T, Jones DT. Membrane protein orientation and refinement using a knowledge-based statistical potential. BMC Bioinformatics 2013; 14:276. [PMID: 24047460 PMCID: PMC3852961 DOI: 10.1186/1471-2105-14-276] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 09/05/2013] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment, which is otherwise challenging to investigate using experimental techniques due to the difficulty in crystallising membrane proteins embedded within intact membranes. RESULTS We have developed a knowledge-based membrane potential, calculated by the statistical analysis of transmembrane protein structures, coupled with a combination of genetic and direct search algorithms, and demonstrate its use in positioning proteins in membranes, refinement of membrane protein models and in decoy discrimination. CONCLUSIONS Our method is able to quickly and accurately orientate both alpha-helical and beta-barrel membrane proteins within the lipid bilayer, showing closer agreement with experimentally determined values than existing approaches. We also demonstrate both consistent and significant refinement of membrane protein models and the effective discrimination between native and decoy structures. Source code is available under an open source license from http://bioinf.cs.ucl.ac.uk/downloads/memembed/.
Collapse
Affiliation(s)
- Timothy Nugent
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
509
|
Kim DE, Dimaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 2013; 82 Suppl 2:208-18. [PMID: 23900763 DOI: 10.1002/prot.24374] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Revised: 06/12/2013] [Accepted: 06/21/2013] [Indexed: 12/19/2022]
Abstract
A number of methods have been described for identifying pairs of contacting residues in protein three-dimensional structures, but it is unclear how many contacts are required for accurate structure modeling. The CASP10 assisted contact experiment provided a blind test of contact guided protein structure modeling. We describe the models generated for these contact guided prediction challenges using the Rosetta structure modeling methodology. For nearly all cases, the submitted models had the correct overall topology, and in some cases, they had near atomic-level accuracy; for example the model of the 384 residue homo-oligomeric tetramer (Tc680o) had only 2.9 Å root-mean-square deviation (RMSD) from the crystal structure. Our results suggest that experimental and bioinformatic methods for obtaining contact information may need to generate only one correct contact for every 12 residues in the protein to allow accurate topology level modeling.
Collapse
Affiliation(s)
- David E Kim
- Department of Biochemistry, University of Washington, Seattle, 98195, Washington
| | | | | | | | | |
Collapse
|
510
|
Proton-dependent gating and proton uptake by Wzx support O-antigen-subunit antiport across the bacterial inner membrane. mBio 2013; 4:e00678-13. [PMID: 24023388 PMCID: PMC3774195 DOI: 10.1128/mbio.00678-13] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Wzx flippases are crucial for bacterial cell surface polysaccharide assembly as they transport undecaprenyl pyrophosphate-linked sugar repeat units from the cytoplasmic to the periplasmic leaflets of the inner membrane (IM) for final assembly. Our recently reported three-dimensional (3D) model structure of Wzx from Pseudomonas aeruginosa PAO1 (WzxPa) displayed a cationic internal vestibule and functionally essential acidic amino acids within transmembrane segment bundles. Herein, we examined the intrinsic transport function of WzxPa following its purification and reconstitution in phospholipid liposomes. WzxPa was capable of mediating anion flux, consistent with its cationic interior. This flux was electrogenic and modified by extraliposomal pH. Mutation of the above-mentioned acidic residues (E61, D269, and D359) reduced proton (H+)-modified anion flux, showing the role of these amino acid side chains in H+-dependent transport. Wzx also mediated acidification of the proteoliposome interior in the presence of an outward anion gradient. These results indicate H+-dependent gating and H+ uptake by WzxPa and allow for the first H+-dependent antiport mechanism to be proposed for lipid-linked oligosaccharide translocation across the bacterial IM. Many bacterial cell surface polysaccharides that are important for survival and virulence are synthesized at the periplasmic leaflet of the inner membrane (IM) using precursors produced in the cytoplasm. Wzx flippases are responsible for translocation of lipid-linked sugar repeat units across the IM and had been previously suggested to simply facilitate passive substrate diffusion. Through our characterization of purified Wzx in a reconstitution system described herein, we have observed protein-dependent intrinsic transport producing a change in the electrical potential of the system, with H+ identified as the coupling ion. These results provide the first evidence for coupled (i.e., secondary active) transport by these proteins and, in conjunction with structural data, allow for an antiport mechanism to be proposed for the directed transport of lipid-linked sugar substrates across the IM. These findings bring our understanding of lipid-linked polysaccharide transporter proteins more in line with the efflux pumps to which they are evolutionarily related.
Collapse
|
511
|
Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 2013; 110:15674-9. [PMID: 24009338 DOI: 10.1073/pnas.1314045110] [Citation(s) in RCA: 485] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recently developed methods have shown considerable promise in predicting residue-residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database.
Collapse
|
512
|
Gleichmann T, Diensthuber RP, Möglich A. Charting the signal trajectory in a light-oxygen-voltage photoreceptor by random mutagenesis and covariance analysis. J Biol Chem 2013; 288:29345-55. [PMID: 24003219 DOI: 10.1074/jbc.m113.506139] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Modular signal receptors empower organisms to process environmental stimuli into adequate physiological responses. At the molecular level, a sensor module receives signals and processes the inherent information into changes of biological activity of an effector module. To better understand the molecular bases underpinning these processes, we analyzed signal reception and processing in the dimeric light-oxygen-voltage (LOV) blue light receptor YF1 that serves as a paradigm for the widespread Per-ARNT-Sim (PAS) signal receptors. Random mutagenesis identifies numerous YF1 variants in which biological activity is retained but where light regulation is abolished or inverted. One group of variants carries mutations within the LOV photosensor that disrupt proper coupling of the flavin-nucleotide chromophore to the protein scaffold. Another larger group bears mutations that cluster at the dyad interface and disrupt signal transmission to two coaxial coiled-coils that connect to the effector. Sequence covariation implies wide conservation of structural and mechanistic motifs, as also borne out by comparison to several PAS domains in which mutations leading to disruption of signal transduction consistently map to confined regions broadly equivalent to those identified in YF1. Not only do these data provide insight into general mechanisms of signal transduction, but also they establish concrete means for customized reprogramming of signal receptors.
Collapse
Affiliation(s)
- Tobias Gleichmann
- From the Humboldt-Universität zu Berlin, Institut für Biologie, Biophysikalische Chemie, Invalidenstraße 42, 10115 Berlin, Germany
| | | | | |
Collapse
|
513
|
Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact prediction in CASP10. Proteins 2013; 82 Suppl 2:138-53. [PMID: 23760879 DOI: 10.1002/prot.24340] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 05/14/2013] [Accepted: 05/21/2013] [Indexed: 12/13/2022]
Abstract
We present the results of the assessment of the intramolecular residue-residue contact predictions from 26 prediction groups participating in the 10th round of the CASP experiment. The most recently developed direct coupling analysis methods did not take part in the experiment likely because they require a very deep sequence alignment not available for any of the 114 CASP10 targets. The performance of contact prediction methods was evaluated with the measures used in previous CASPs (i.e., prediction accuracy and the difference between the distribution of the predicted contacts and that of all pairs of residues in the target protein), as well as new measures, such as the Matthews correlation coefficient, the area under the precision-recall curve and the ranks of the first correctly and incorrectly predicted contact. We also evaluated the ability to detect interdomain contacts and tested whether the difficulty of predicting contacts depends upon the protein length and the depth of the family sequence alignment. The analyses were carried out on the target domains for which structural homologs did not exist or were difficult to identify. The evaluation was performed for all types of contacts (short, medium, and long-range), with emphasis placed on long-range contacts, i.e. those involving residues separated by at least 24 residues along the sequence. The assessment suggests that the best CASP10 contact prediction methods perform at approximately the same level, and comparably to those participating in CASP9.
Collapse
|
514
|
From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLoS Comput Biol 2013; 9:e1003176. [PMID: 23990764 PMCID: PMC3749948 DOI: 10.1371/journal.pcbi.1003176] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 06/24/2013] [Indexed: 11/19/2022] Open
Abstract
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. Extracting functional and structural information about protein families from the covariation of residues in multiple sequence alignments is an important challenge in computational biology. Here we propose a statistical-physics inspired framework to analyze those covariations, which naturally unifies existing methods in the literature. Our approach allows us to identify statistically relevant ‘patterns’ of residues, specific to a protein family. We show that many patterns correspond to a small number of sites on the protein sequence, in close contact on the 3D fold. Hence, those patterns allow us to make accurate predictions about the contact map from sequence data only. Further more, we show that the dimensional reduction, which is achieved by considering only the statistically most significant patterns, avoids overfitting in small sequence alignments, and improves our capacity of extracting residue contacts in this case.
Collapse
|
515
|
Yang J, Jang R, Zhang Y, Shen HB. High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling. ACTA ACUST UNITED AC 2013; 29:2579-87. [PMID: 23946502 DOI: 10.1093/bioinformatics/btt440] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
MOTIVATION Residue-residue contacts across the transmembrane helices dictate the three-dimensional topology of alpha-helical membrane proteins. However, contact determination through experiments is difficult because most transmembrane proteins are hard to crystallize. RESULTS We present a novel method (MemBrain) to derive transmembrane inter-helix contacts from amino acid sequences by combining correlated mutations and multiple machine learning classifiers. Tested on 60 non-redundant polytopic proteins using a strict leave-one-out cross-validation protocol, MemBrain achieves an average accuracy of 62%, which is 12.5% higher than the current best method from the literature. When applied to 13 recently solved G protein-coupled receptors, the MemBrain contact predictions helped increase the TM-score of the I-TASSER models by 37% in the transmembrane region. The number of foldable cases (TM-score >0.5) increased by 100%, where all G protein-coupled receptor templates and homologous templates with sequence identity >30% were excluded. These results demonstrate significant progress in contact prediction and a potential for contact-driven structure modeling of transmembrane proteins. AVAILABILITY www.csbio.sjtu.edu.cn/bioinf/MemBrain/
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | |
Collapse
|
516
|
Predicting functionally informative mutations in Escherichia coli BamA using evolutionary covariance analysis. Genetics 2013; 195:443-55. [PMID: 23934888 DOI: 10.1534/genetics.113.155861] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The essential outer membrane β-barrel protein BamA forms a complex with four lipoprotein partners BamBCDE that assembles β-barrel proteins into the outer membrane of Escherichia coli. Detailed genetic studies have shown that BamA cycles through multiple conformations during substrate assembly, suggesting that a complex network of residues may be involved in coordinating conformational changes and lipoprotein partner function. While genetic analysis of BamA has been informative, it has also been slow in the absence of a straightforward selection for mutants. Here we take a bioinformatic approach to identify candidate residues for mutagenesis using direct coupling analysis. Starting with the BamA paralog FhaC, we show that direct coupling analysis works well for large β-barrel proteins, identifying pairs of residues in close proximity in tertiary structure with a true positive rate of 0.64 over the top 50 predictions. To reduce the effects of noise, we designed and incorporated a novel structured prior into the empirical correlation matrix, dramatically increasing the FhaC true positive rate from 0.64 to 0.88 over the top 50 predictions. Our direct coupling analysis of BamA implicates residues R661 and D740 in a functional interaction. We find that the substitutions R661G and D740G each confer OM permeability defects and destabilize the BamA β-barrel. We also identify synthetic phenotypes and cross-suppressors that suggest R661 and D740 function in a similar process and may interact directly. We expect that the direct coupling analysis approach to informed mutagenesis will be particularly useful in systems lacking adequate selections and for dynamic proteins with multiple conformations.
Collapse
|
517
|
Feizi S, Marbach D, Médard M, Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol 2013; 31:726-33. [PMID: 23851448 PMCID: PMC3773370 DOI: 10.1038/nbt.2635] [Citation(s) in RCA: 145] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 06/11/2013] [Indexed: 01/08/2023]
Abstract
Recognizing direct relationships between variables connected in a network is a pervasive problem in biological, social and information sciences as correlation-based networks contain numerous indirect relationships. Here we present a general method for inferring direct effects from an observed correlation matrix containing both direct and indirect effects. We formulate the problem as the inverse of network convolution, and introduce an algorithm that removes the combined effect of all indirect paths of arbitrary length in a closed-form solution by exploiting eigen-decomposition and infinite-series sums. We demonstrate the effectiveness of our approach in several network applications: distinguishing direct targets in gene expression regulatory networks; recognizing directly-interacting amino-acid residues for protein structure prediction from sequence alignments; and distinguishing strong collaborations in co-authorship social networks using connectivity information alone.
Collapse
Affiliation(s)
- Soheil Feizi
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
| | | | | | | |
Collapse
|
518
|
Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Curr Opin Struct Biol 2013; 23:473-9. [PMID: 23680395 DOI: 10.1016/j.sbi.2013.04.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 03/12/2013] [Accepted: 04/02/2013] [Indexed: 11/26/2022]
Abstract
Recent work has led to a substantial improvement in the accuracy of predictions of contacts between amino acids using evolutionary information derived from multiple sequence alignments. Where large numbers of diverse sequence relatives are available and can be aligned to the sequence of a protein of unknown structure it is now possible to generate high-resolution models without recourse to the structure of a template. In this review we describe these exciting new techniques and critically assess the state-of-the-art in contact prediction in the light of these. While concentrating on methods, we also discuss applications to protein and RNA structure prediction as well as potential future developments.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
| | | | | |
Collapse
|
519
|
Skwark MJ, Abdel-Rehim A, Elofsson A. PconsC: combination of direct information methods and alignments improves contact prediction. ACTA ACUST UNITED AC 2013; 29:1815-6. [PMID: 23658418 DOI: 10.1093/bioinformatics/btt259] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY Recently, several new contact prediction methods have been published. They use (i) large sets of multiple aligned sequences and (ii) assume that correlations between columns in these alignments can be the results of indirect interaction. These methods are clearly superior to earlier methods when it comes to predicting contacts in proteins. Here, we demonstrate that combining predictions from two prediction methods, PSICOV and plmDCA, and two alignment methods, HHblits and jackhmmer at four different e-value cut-offs, provides a relative improvement of 20% in comparison with the best single method, exceeding 70% correct predictions for one contact prediction per residue. AVAILABILITY The source code for PconsC along with supplementary data is freely available at http://c.pcons.net/ CONTACT arne@bioinfo.se SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcin J Skwark
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden
| | | | | |
Collapse
|
520
|
Protein structure prediction from sequence variation. Nat Biotechnol 2013; 30:1072-80. [PMID: 23138306 DOI: 10.1038/nbt.2419] [Citation(s) in RCA: 450] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 10/15/2012] [Indexed: 02/07/2023]
Abstract
Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.
Collapse
|
521
|
Abstract
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Collapse
Affiliation(s)
- David de Juan
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | |
Collapse
|
522
|
Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.10.012] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
523
|
Sadowski MI. Prediction of protein domain boundaries from inverse covariances. Proteins 2013; 81:253-60. [PMID: 22987736 PMCID: PMC3563215 DOI: 10.1002/prot.24181] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Revised: 08/10/2012] [Accepted: 09/04/2012] [Indexed: 01/04/2023]
Abstract
It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction.
Collapse
Affiliation(s)
- Michael I Sadowski
- MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.
| |
Collapse
|
524
|
Savojardo C, Fariselli P, Martelli PL, Casadio R. Prediction of disulfide connectivity in proteins with machine-learning methods and correlated mutations. BMC Bioinformatics 2013; 14 Suppl 1:S10. [PMID: 23368835 PMCID: PMC3548674 DOI: 10.1186/1471-2105-14-s1-s10] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background Recently, information derived by correlated mutations in proteins has regained relevance for predicting protein contacts. This is due to new forms of mutual information analysis that have been proven to be more suitable to highlight direct coupling between pairs of residues in protein structures and to the large number of protein chains that are currently available for statistical validation. It was previously discussed that disulfide bond topology in proteins is also constrained by correlated mutations. Results In this paper we exploit information derived from a corrected mutual information analysis and from the inverse of the covariance matrix to address the problem of the prediction of the topology of disulfide bonds in Eukaryotes. Recently, we have shown that Support Vector Regression (SVR) can improve the prediction for the disulfide connectivity patterns. Here we show that the inclusion of the correlated mutation information increases of 5 percentage points the SVR performance (from 54% to 59%). When this approach is used in combination with a method previously developed by us and scoring at the state of art in predicting both location and topology of disulfide bonds in Eukaryotes (DisLocate), the per-protein accuracy is 38%, 2 percentage points higher than that previously obtained. Conclusions In this paper we show that the inclusion of information derived from correlated mutations can improve the performance of the state of the art methods for predicting disulfide connectivity patterns in Eukaryotic proteins. Our analysis also provides support to the notion that improving methods to extract evolutionary information from multiple sequence alignments greatly contributes to the scoring performance of predictors suited to detect relevant features from protein chains.
Collapse
Affiliation(s)
- Castrense Savojardo
- Department of Computer Science and Engineering, University of Bologna, Via Mura Anteo Zamboni 7, 41029 Bologna, Italy
| | | | | | | |
Collapse
|
525
|
Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 87:012707. [PMID: 23410359 DOI: 10.1103/physreve.87.012707] [Citation(s) in RCA: 411] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Indexed: 05/24/2023]
Abstract
Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.
Collapse
Affiliation(s)
- Magnus Ekeberg
- Engineering Physics Program, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | | | | | | | | |
Collapse
|
526
|
Abstract
Recent work has led to a substantial improvement in the accuracy of predictions of contacts between amino acids using evolutionary information derived from multiple sequence alignments. Where large numbers of diverse sequence relatives are available and can be aligned to the sequence of a protein of unknown structure, it is now possible to generate high-resolution models without recourse to the structure of a template. In this review, we describe these exciting new techniques and critically assess the state of the art in contact prediction in light of them. We discuss areas for immediate research and development as well as potential future developments.
Collapse
|
527
|
Affiliation(s)
- Ken A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794-5252, USA.
| | | |
Collapse
|
528
|
Accurate simulation and detection of coevolution signals in multiple sequence alignments. PLoS One 2012; 7:e47108. [PMID: 23091608 PMCID: PMC3473043 DOI: 10.1371/journal.pone.0047108] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 09/10/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. METHODOLOGY/PRINCIPAL FINDINGS We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. CONCLUSIONS/SIGNIFICANCE Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case.
Collapse
|
529
|
Eickholt J, Cheng J. Predicting protein residue-residue contacts using deep networks and boosting. Bioinformatics 2012; 28:3066-72. [PMID: 23047561 DOI: 10.1093/bioinformatics/bts598] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Protein residue-residue contacts continue to play a larger and larger role in protein tertiary structure modeling and evaluation. Yet, while the importance of contact information increases, the performance of sequence-based contact predictors has improved slowly. New approaches and methods are needed to spur further development and progress in the field. RESULTS Here we present DNCON, a new sequence-based residue-residue contact predictor using deep networks and boosting techniques. Making use of graphical processing units and CUDA parallel computing technology, we are able to train large boosted ensembles of residue-residue contact predictors achieving state-of-the-art performance. AVAILABILITY The web server of the prediction method (DNCON) is available at http://iris.rnet.missouri.edu/dncon/. CONTACT chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | |
Collapse
|
530
|
|
531
|
Kalinina OV, Oberwinkler H, Glass B, Kräusslich HG, Russell RB, Briggs JAG. Computational identification of novel amino-acid interactions in HIV Gag via correlated evolution. PLoS One 2012; 7:e42468. [PMID: 22879995 PMCID: PMC3411748 DOI: 10.1371/journal.pone.0042468] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 07/09/2012] [Indexed: 12/31/2022] Open
Abstract
Pairs of amino acid positions that evolve in a correlated manner are proposed to play important roles in protein structure or function. Methods to detect them might fare better with families for which sequences of thousands of closely related homologs are available than families with only a few distant relatives. We applied co-evolution analysis to thousands of sequences of HIV Gag, finding that the most significantly co-evolving positions are proximal in the quaternary structures of the viral capsid. A reduction in infectivity caused by mutating one member of a significant pair could be rescued by a compensatory mutation of the other.
Collapse
Affiliation(s)
- Olga V. Kalinina
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
| | - Heike Oberwinkler
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Bärbel Glass
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Hans-Georg Kräusslich
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Robert B. Russell
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
| | - John A. G. Briggs
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
532
|
Abstract
MOTIVATION Residue-residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20% on long-range contacts, falling short of the level required for ab initio structure prediction. RESULTS Here, we develop a novel machine learning approach for contact map prediction using three steps of increasing resolution. First, we use 2D recursive neural networks to predict coarse contacts and orientations between secondary structure elements. Second, we use an energy-based method to align secondary structure elements and predict contact probabilities between residues in contacting alpha-helices or strands. Third, we use a deep neural network architecture to organize and progressively refine the prediction of contacts, integrating information over both space and time. We train the architecture on a large set of non-redundant proteins and test it on a large set of non-homologous domains, as well as on the set of protein domains used for contact prediction in the two most recent CASP8 and CASP9 experiments. For long-range contacts, the accuracy of the new CMAPpro predictor is close to 30%, a significant increase over existing approaches. AVAILABILITY CMAPpro is available as part of the SCRATCH suite at http://scratch.proteomics.ics.uci.edu/. CONTACT pfbaldi@uci.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pietro Di Lena
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | | | | |
Collapse
|
533
|
Abstract
Co-evolving positions within protein sequences have been used as spatial constraints to develop a computational approach for modeling membrane protein structures.
Collapse
|
534
|
Shang L, Xu W, Ozer S, Gutell RR. Structural constraints identified with covariation analysis in ribosomal RNA. PLoS One 2012; 7:e39383. [PMID: 22724009 PMCID: PMC3378556 DOI: 10.1371/journal.pone.0039383] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2011] [Accepted: 05/24/2012] [Indexed: 11/19/2022] Open
Abstract
Covariation analysis is used to identify those positions with similar patterns of sequence variation in an alignment of RNA sequences. These constraints on the evolution of two positions are usually associated with a base pair in a helix. While mutual information (MI) has been used to accurately predict an RNA secondary structure and a few of its tertiary interactions, early studies revealed that phylogenetic event counting methods are more sensitive and provide extra confidence in the prediction of base pairs. We developed a novel and powerful phylogenetic events counting method (PEC) for quantifying positional covariation with the Gutell lab’s new RNA Comparative Analysis Database (rCAD). The PEC and MI-based methods each identify unique base pairs, and jointly identify many other base pairs. In total, both methods in combination with an N-best and helix-extension strategy identify the maximal number of base pairs. While covariation methods have effectively and accurately predicted RNAs secondary structure, only a few tertiary structure base pairs have been identified. Analysis presented herein and at the Gutell lab’s Comparative RNA Web (CRW) Site reveal that the majority of these latter base pairs do not covary with one another. However, covariation analysis does reveal a weaker although significant covariation between sets of nucleotides that are in proximity in the three-dimensional RNA structure. This reveals that covariation analysis identifies other types of structural constraints beyond the two nucleotides that form a base pair.
Collapse
MESH Headings
- Algorithms
- Base Pairing
- Computational Biology/methods
- Nucleic Acid Conformation
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/genetics
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 23S/chemistry
- RNA, Ribosomal, 23S/genetics
- RNA, Ribosomal, 5S/chemistry
- RNA, Ribosomal, 5S/genetics
Collapse
Affiliation(s)
- Lei Shang
- Institute for Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
| | - Weijia Xu
- Texas Advanced Computing Center, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stuart Ozer
- Microsoft Corporation, Redmond, Washington, United States of America
| | - Robin R. Gutell
- Institute for Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
535
|
Kozma D, Simon I, Tusnády GE. CMWeb: an interactive on-line tool for analysing residue-residue contacts and contact prediction methods. Nucleic Acids Res 2012; 40:W329-33. [PMID: 22669913 PMCID: PMC3394325 DOI: 10.1093/nar/gks488] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A contact map is a 2D derivative of the 3D structure of proteins, containing various residue–residue (RR) contacts within the structure. Contact maps can be used for the reconstruction of structure with high accuracy and can be predicted from the amino acid sequence. Therefore understanding the various properties of contact maps is an important step in protein structure prediction. For investigating basic properties of contact formation and contact clusters we set up an integrated system called Contact Map Web Viewer, or CMWeb for short. The server can be used to visualize contact maps, to link contacts and to show them both in 3D structures and in multiple sequence alignments and to calculate various statistics on contacts. Moreover, we have implemented five contact prediction methods in the CMWeb server to visualize the predicted and real RR contacts in one contact map. The results of other RR contact prediction methods can be uploaded as a benchmark test onto the server as well. All of these functionality is behind a web server, thus for using our application only a Java-capable web browser is needed, no further program installation is required. The CMWeb is freely accessible at http://cmweb.enzim.hu.
Collapse
Affiliation(s)
- Dániel Kozma
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary
| | | | | |
Collapse
|
536
|
Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci U S A 2012; 109:E1540-7. [PMID: 22645369 DOI: 10.1073/pnas.1120036109] [Citation(s) in RCA: 140] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A new de novo protein structure prediction method for transmembrane proteins (FILM3) is described that is able to accurately predict the structures of large membrane proteins domains using an ensemble of two secondary structure prediction methods to guide fragment selection in combination with a scoring function based solely on correlated mutations detected in multiple sequence alignments. This approach has been validated by generating models for 28 membrane proteins with a diverse range of complex topologies and an average length of over 300 residues with results showing that TM-scores > 0.5 can be achieved in almost every case following refinement using MODELLER. In one of the most impressive results, a model of mitochondrial cytochrome c oxidase polypeptide I was obtained with a TM-score > 0.75 and an rmsd of only 5.7 Å over all 514 residues. These results suggest that FILM3 could be applicable to a wide range of transmembrane proteins of as-yet-unknown 3D structure given sufficient homologous sequences.
Collapse
|
537
|
Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012; 149:1607-21. [PMID: 22579045 DOI: 10.1016/j.cell.2012.04.012] [Citation(s) in RCA: 389] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Revised: 04/12/2012] [Accepted: 04/23/2012] [Indexed: 01/21/2023]
Abstract
We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.
Collapse
Affiliation(s)
- Thomas A Hopf
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
538
|
Abstract
The process of amino acid replacement in proteins is context-dependent, with substitution rates influenced by local structure, functional role, and amino acids at other locations. Predicting how these differences affect replacement processes is difficult. To make such inference easier, it is often assumed that the acceptabilities of different amino acids at a position are constant. However, evolutionary interactions among residue positions will tend to invalidate this assumption. Here, we use simulations of purple acid phosphatase evolution to show that amino acid propensities at a position undergo predictable change after an amino acid replacement at that position. After a replacement, the new amino acid and similar amino acids tend to become gradually more acceptable over time at that position. In other words, proteins tend to equilibrate to the presence of an amino acid at a position through replacements at other positions. Such a shift is reminiscent of the spectroscopy effect known as the Stokes shift, where molecules receiving a quantum of energy and moving to a higher electronic state will adjust to the new state and emit a smaller quantum of energy whenever they shift back down to the original ground state. Predictions of changes in stability in real proteins show that mutation reversals become less favorable over time, and thus, broadly support our results. The observation of an evolutionary Stokes shift has profound implications for the study of protein evolution and the modeling of evolutionary processes.
Collapse
|
539
|
Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Protein Sci 2011; 21:299-305. [PMID: 22102360 DOI: 10.1002/pro.2002] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2011] [Revised: 11/08/2011] [Accepted: 11/10/2011] [Indexed: 11/12/2022]
Abstract
Residue contacts predicted from correlated positions in a multiple sequence alignment are often sparse and uncertain. To some extent, these limitations in the data can be overcome by grouping the contacts by secondary structure elements and enumerating the possible packing arrangements of these elements in a combinatorial manner. Strong interactions appear frequently but inconsistent interactions are down-weighted and missing interactions up-weighted. The resulting improved consistency in the predicted interactions has allowed the method to be successfully applied to proteins up to 200 residues in length which is larger than any structure previously predicted using sequence data alone.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.
| | | | | |
Collapse
|