301
|
Cheung NJ, Yu W. De novo protein structure prediction using ultra-fast molecular dynamics simulation. PLoS One 2018; 13:e0205819. [PMID: 30458007 PMCID: PMC6245515 DOI: 10.1371/journal.pone.0205819] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 10/02/2018] [Indexed: 11/19/2022] Open
Abstract
Modern genomics sequencing techniques have provided a massive amount of protein sequences, but experimental endeavor in determining protein structures is largely lagging far behind the vast and unexplored sequences. Apparently, computational biology is playing a more important role in protein structure prediction than ever. Here, we present a system of de novo predictor, termed NiDelta, building on a deep convolutional neural network and statistical potential enabling molecular dynamics simulation for modeling protein tertiary structure. Combining with evolutionary-based residue-contacts, the presented predictor can predict the tertiary structures of a number of target proteins with remarkable accuracy. The proposed approach is demonstrated by calculations on a set of eighteen large proteins from different fold classes. The results show that the ultra-fast molecular dynamics simulation could dramatically reduce the gap between the sequence and its structure at atom level, and it could also present high efficiency in protein structure determination if sparse experimental data is available.
Collapse
Affiliation(s)
- Ngaam J. Cheung
- Department of Brain and Cognitive Science, DGIST, Daegu, South Korea
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, United Kingdom
| | - Wookyung Yu
- Department of Brain and Cognitive Science, DGIST, Daegu, South Korea
- Core Protein Resources Center, DGIST, Daegu, South Korea
- * E-mail:
| |
Collapse
|
302
|
Ding W, Mao W, Shao D, Zhang W, Gong H. DeepConPred2: An Improved Method for the Prediction of Protein Residue Contacts. Comput Struct Biotechnol J 2018; 16:503-510. [PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/16/2018] [Accepted: 10/18/2018] [Indexed: 12/18/2022] Open
Abstract
Information of residue-residue contacts is essential for understanding the mechanism of protein folding, and has been successfully applied as special topological restraints to simplify the conformational sampling in de novo protein structure prediction. Prediction of protein residue contacts has experienced amazingly rapid progresses recently, with prediction accuracy approaching impressively high levels in the past two years. In this work, we introduce a second version of our residue contact predictor, DeepConPred2, which exhibits substantially improved performance and sufficiently reduced running time after model re-optimization and feature updates. When testing on the CASP12 free modeling targets, our program reaches at least the same level of prediction accuracy as the best contact predictors so far and provides information complementary to other state-of-the-art methods in contact-assisted folding.
Collapse
Affiliation(s)
- Wenze Ding
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Di Shao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenxuan Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
303
|
Vorberg S, Seemayer S, Söding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput Biol 2018; 14:e1006526. [PMID: 30395601 PMCID: PMC6237422 DOI: 10.1371/journal.pcbi.1006526] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/15/2018] [Accepted: 09/24/2018] [Indexed: 12/01/2022] Open
Abstract
Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny. Knowledge about the three-dimensional structure of proteins is key to understanding their function and role in biological processes and diseases. The experimental structure determination techniques, such as X-ray crystallography or electron cryo-microscopy, are labour intensive, time-consuming and expensive. Therefore, complementary computational methods to predict a protein’s structure have become indispensable. Over the last years, immense progress has been made in predicting protein structures from their amino acid sequence by utilizing highly accurate predictions of spatial contacts between amino acid residues as constraints in folding simulations. However, contact prediction methods require large numbers of homologous protein sequences in order to discriminate between signal and noise. A major obstacle preventing progress on the statistical methodology is our limited understanding of the different components of noise that are known to affect the predictions. We provide two tools, CCMpredPy and CCMgen, that can be used to learn highly accurate statistical models for contact prediction and to simulate protein evolution according to the statistical constraints between positions of residues as specified by these models, respectively. We showcase their usefulness by quantifying the relative contribution of noise arising from entropy and phylogeny on the predicted contacts, which will facilitate the improvement of the statistical methodology.
Collapse
Affiliation(s)
- Susann Vorberg
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Stefan Seemayer
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| |
Collapse
|
304
|
Co-Evolution of Intrinsically Disordered Proteins with Folded Partners Witnessed by Evolutionary Couplings. Int J Mol Sci 2018; 19:ijms19113315. [PMID: 30366362 PMCID: PMC6274761 DOI: 10.3390/ijms19113315] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 10/19/2018] [Accepted: 10/22/2018] [Indexed: 12/22/2022] Open
Abstract
Although improved strategies for the detection and analysis of evolutionary couplings (ECs) between protein residues already enable the prediction of protein structures and interactions, they are mostly restricted to conserved and well-folded proteins. Whereas intrinsically disordered proteins (IDPs) are central to cellular interaction networks, due to the lack of strict structural constraints, they undergo faster evolutionary changes than folded domains. This makes the reliable identification and alignment of IDP homologs difficult, which led to IDPs being omitted in most large-scale residue co-variation analyses. By preforming a dedicated analysis of phylogenetically widespread bacterial IDP–partner interactions, here we demonstrate that partner binding imposes constraints on IDP sequences that manifest in detectable interprotein ECs. These ECs were not detected for interactions mediated by short motifs, rather for those with larger IDP–partner interfaces. Most identified coupled residue pairs reside close (<10 Å) to each other on the interface, with a third of them forming multiple direct atomic contacts. EC-carrying interfaces of IDPs are enriched in negatively charged residues, and the EC residues of both IDPs and partners preferentially reside in helices. Our analysis brings hope that IDP–partner interactions difficult to study could soon be successfully dissected through residue co-variation analysis.
Collapse
|
305
|
Cirri E, Brier S, Assal R, Canul-Tec JC, Chamot-Rooke J, Reyes N. Consensus designs and thermal stability determinants of a human glutamate transporter. eLife 2018; 7:40110. [PMID: 30334738 PMCID: PMC6209432 DOI: 10.7554/elife.40110] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Accepted: 10/17/2018] [Indexed: 11/25/2022] Open
Abstract
Human excitatory amino acid transporters (EAATs) take up the neurotransmitter glutamate in the brain and are essential to maintain excitatory neurotransmission. Our understanding of the EAATs’ molecular mechanisms has been hampered by the lack of stability of purified protein samples for biophysical analyses. Here, we present approaches based on consensus mutagenesis to obtain thermostable EAAT1 variants that share up to ~95% amino acid identity with the wild type transporters, and remain natively folded and functional. Structural analyses of EAAT1 and the consensus designs using hydrogen-deuterium exchange linked to mass spectrometry show that small and highly cooperative unfolding events at the inter-subunit interface rate-limit their thermal denaturation, while the transport domain unfolds at a later stage in the unfolding pathway. Our findings provide structural insights into the kinetic stability of human glutamate transporters, and introduce general approaches to extend the lifetime of human membrane proteins for biophysical analyses.
Collapse
Affiliation(s)
- Erica Cirri
- Molecular Mechanisms of Membrane Transport Laboratory, Institut Pasteur, Paris, France.,UMR 3528, CNRS, Institut Pasteur, Paris, France
| | - Sébastien Brier
- Mass Spectrometry for Biology Unit, Institut Pasteur, Paris, France.,USR 2000, CNRS, Institut Pasteur, Paris, France
| | - Reda Assal
- Molecular Mechanisms of Membrane Transport Laboratory, Institut Pasteur, Paris, France.,UMR 3528, CNRS, Institut Pasteur, Paris, France
| | - Juan Carlos Canul-Tec
- Molecular Mechanisms of Membrane Transport Laboratory, Institut Pasteur, Paris, France.,UMR 3528, CNRS, Institut Pasteur, Paris, France
| | - Julia Chamot-Rooke
- Mass Spectrometry for Biology Unit, Institut Pasteur, Paris, France.,USR 2000, CNRS, Institut Pasteur, Paris, France
| | - Nicolas Reyes
- Molecular Mechanisms of Membrane Transport Laboratory, Institut Pasteur, Paris, France.,UMR 3528, CNRS, Institut Pasteur, Paris, France
| |
Collapse
|
306
|
Rouse SL, Matthews SJ, Dueholm MS. Ecology and Biogenesis of Functional Amyloids in Pseudomonas. J Mol Biol 2018; 430:3685-3695. [PMID: 29753779 PMCID: PMC6173800 DOI: 10.1016/j.jmb.2018.05.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 05/03/2018] [Accepted: 05/04/2018] [Indexed: 12/02/2022]
Abstract
Functional amyloids can be found in the extracellular matrix produced by many bacteria during biofilm growth. They mediate the initial attachment of bacteria to surfaces and provide stability and functionality to mature biofilms. Efficient amyloid biogenesis requires a highly coordinated system of amyloid subunits, molecular chaperones and transport systems. The functional amyloid of Pseudomonas (Fap) represents such a system. Here, we review the phylogenetic diversification of the Fap system, its potential ecological role and the dedicated machinery required for Fap biogenesis, with a particular focus on the amyloid exporter FapF, the structure of which has been recently resolved. We also present a sequence covariance-based in silico model of the FapC fiber-forming subunit. Finally, we highlight key questions that remain unanswered and we believe deserve further attention by the scientific community.
Collapse
Affiliation(s)
- Sarah L Rouse
- Department of Life Sciences, Imperial College London, South Kensington Campus, London, SW72AZ, UK
| | - Stephen J Matthews
- Department of Life Sciences, Imperial College London, South Kensington Campus, London, SW72AZ, UK
| | - Morten S Dueholm
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| |
Collapse
|
307
|
Hjortness MK, Riccardi L, Hongdusit A, Zwart PH, Sankaran B, De Vivo M, Fox JM. Evolutionarily Conserved Allosteric Communication in Protein Tyrosine Phosphatases. Biochemistry 2018; 57:6443-6451. [DOI: 10.1021/acs.biochem.8b00656] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Michael K. Hjortness
- Department of Chemical and Biological Engineering, University of Colorado, 3415 Colorado Avenue, Boulder, Colorado 80303, United States
| | - Laura Riccardi
- Laboratory of Molecular Modeling and Drug Discovery, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Akarawin Hongdusit
- Department of Chemical and Biological Engineering, University of Colorado, 3415 Colorado Avenue, Boulder, Colorado 80303, United States
| | - Peter H. Zwart
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Banumathi Sankaran
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Marco De Vivo
- Laboratory of Molecular Modeling and Drug Discovery, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Jerome M. Fox
- Department of Chemical and Biological Engineering, University of Colorado, 3415 Colorado Avenue, Boulder, Colorado 80303, United States
| |
Collapse
|
308
|
Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 2018; 34:3308-3315. [PMID: 29718112 PMCID: PMC6157083 DOI: 10.1093/bioinformatics/bty341] [Citation(s) in RCA: 118] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 03/06/2018] [Accepted: 04/25/2018] [Indexed: 12/22/2022] Open
Abstract
Motivation In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Results Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. Availability and implementation DeepCov is freely available at https://github.com/psipred/DeepCov. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David T Jones
- Department of Computer Science, University College London, London, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
309
|
Abstract
Chemical Shift-Rosetta (CS-Rosetta) is an automated method that employs NMR chemical shifts to model protein structures de novo. In this chapter, we introduce the terminology and central concepts of CS-Rosetta. We describe the architecture and functionality of automatic NOESY assignment (AutoNOE) and structure determination protocols (Abrelax and RASREC) within the CS-Rosetta framework. We further demonstrate how CS-Rosetta can discriminate near-native structures against a large conformational search space using restraints obtained from NMR data, and/or sequence and structure homology. We highlight how CS-Rosetta can be combined with alternative automated approaches to (i) model oligomeric systems and (ii) create NMR-based structure determination pipeline. To show its practical applicability, we emphasize on the computational requirements and performance of CS-Rosetta for protein targets of varying molecular weight and complexity. Finally, we discuss the current Python interface, which enables easy execution of protocols for rapid and accurate high-resolution structure determination.
Collapse
Affiliation(s)
- Santrupti Nerli
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, United States; Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Nikolaos G Sgourakis
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, United States.
| |
Collapse
|
310
|
Travers T, Wang KJ, López CA, Gnanakaran S. Sequence- and structure-based computational analyses of Gram-negative tripartite efflux pumps in the context of bacterial membranes. Res Microbiol 2018; 169:414-424. [DOI: 10.1016/j.resmic.2018.01.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Revised: 12/28/2017] [Accepted: 01/21/2018] [Indexed: 01/12/2023]
|
311
|
Wu H, Cao C, Xia X, Lu Q. Unified Deep Learning Architecture for Modeling Biology Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1445-1452. [PMID: 28991751 DOI: 10.1109/tcbb.2017.2760832] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Prediction of the spatial structure or function of biological macromolecules based on their sequences remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, long-range interaction, complicated and variable output of labeled structures, and variable length of biological sequences usually lead to different solutions on a case-by-case basis. This study proposed a unified deep learning architecture based on long short-term memory or a gated recurrent unit to capture long-range interactions. The architecture designs the optional reshape operator to adapt to the diversity of the output labels and implements a training algorithm to support the training of sequence models capable of processing variable-length sequences. The merging and pooling operators enhances the ability of capturing short-range interactions between basic units of biological sequences. The proposed deep-learning architecture and its training algorithm might be capable of solving currently variable biological sequence-modeling problems under a unified framework. We validated the model on one of the most difficult biological sequence-modeling problems, protein residue interaction prediction. The results indicate that the accuracy of obtaining the residue interactions of the model exceeded popular approaches by 10 percent on multiple widely-used benchmarks.
Collapse
|
312
|
Jakubec D, Kratochvíl M, Vymĕtal J, Vondrášek J. Widespread evolutionary crosstalk among protein domains in the context of multi-domain proteins. PLoS One 2018; 13:e0203085. [PMID: 30169546 PMCID: PMC6118372 DOI: 10.1371/journal.pone.0203085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 08/14/2018] [Indexed: 11/20/2022] Open
Abstract
Domains are distinct units within proteins that typically can fold independently into recognizable three-dimensional structures to facilitate their functions. The structural and functional independence of protein domains is reflected by their apparent modularity in the context of multi-domain proteins. In this work, we examined the coupling of evolution of domain sequences co-occurring within multi-domain proteins to see if it proceeds independently, or in a coordinated manner. We used continuous information theory measures to assess the extent of correlated mutations among domains in multi-domain proteins from organisms across the tree of life. In all multi-domain architectures we examined, domains co-occurring within protein sequences had to some degree undergone concerted evolution. This finding challenges the notion of complete modularity and independence of protein domains, providing new perspective on the evolution of protein sequence and function.
Collapse
Affiliation(s)
- David Jakubec
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University, 128 43 Prague 2, Czech Republic
| | - Miroslav Kratochvíl
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, 118 00 Prague 1, Czech Republic
| | - Jiří Vymĕtal
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| | - Jiří Vondrášek
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| |
Collapse
|
313
|
Kc DB. Recent advances in sequence-based protein structure prediction. Brief Bioinform 2018; 18:1021-1032. [PMID: 27562963 DOI: 10.1093/bib/bbw070] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Indexed: 11/13/2022] Open
Abstract
The most accurate characterizations of the structure of proteins are provided by structural biology experiments. However, because of the high cost and labor-intensive nature of the structural experiments, the gap between the number of protein sequences and solved structures is widening rapidly. Development of computational methods to accurately model protein structures from sequences is becoming increasingly important to the biological community. In this article, we highlight some important progress in the field of protein structure prediction, especially those related to free modeling (FM) methods that generate structure models without using homologous templates. We also provide a short synopsis of some of the recent advances in FM approaches as demonstrated in the recent Computational Assessment of Structure Prediction competition as well as recent trends and outlook for FM approaches in protein structure prediction.
Collapse
|
314
|
Koetle MJ, Lloyd Evans D, Singh V, Snyman SJ, Rutherford RS, Watt MP. Agronomic evaluation and molecular characterisation of the acetolactate synthase gene in imazapyr tolerant sugarcane (Saccharum hybrid) genotypes. PLANT CELL REPORTS 2018; 37:1201-1213. [PMID: 29868986 DOI: 10.1007/s00299-018-2306-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 05/23/2018] [Indexed: 06/08/2023]
Abstract
Mutagenesis had no effect on number of stalks/plot, stalk height, fibre and sucrose content of mutants. Imazapyr tolerance is likely due to a S622N mutation in the acetolactate synthase gene. The herbicidal compound imazapyr is effective against weeds such as Cynodon and Rottboellia species that constrain sugarcane production. This study aimed to compare agronomic characteristics of three imazapyr tolerant mutants (Mut 1, Mut 6 and Mut 7) with the non-mutated N12 control after 18 months of growth, and to sequence the acetolactate synthase (ALS) gene to identify any point mutations conferring imazapyr tolerance. There were no significant differences in the number of stalks/plot, stalk height, fibre and sucrose contents of the mutants compared with the N12 control. However, Mut 1 genotype was more susceptible to the Lepidopteran stalk borer, Eldana saccharina when compared with the non-mutated N12 (11.14 ± 1.37 and 3.89 ± 0.52% internodes bored, respectively), making Mut 1 less desirable for commercial cultivation. Molecular characterisation of the ALS gene revealed non-synonymous mutations in Mut 6. An A to G change at nucleotide position 1857 resulted in a N513D mutation, while a G to A change at nucleotide position 2184 imposed a S622N mutation. Molecular dynamics simulations revealed that the S622N mutation renders an asparagine side chain clash with imazapyr, hence this mutation is effective in conferring imazapyr tolerance.
Collapse
Affiliation(s)
- Motselisi J Koetle
- South African Sugarcane Research Institute, Private Bag X02, Mount Edgecombe, Durban, 4300, South Africa.
- School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban, South Africa.
| | - Dyfed Lloyd Evans
- South African Sugarcane Research Institute, Private Bag X02, Mount Edgecombe, Durban, 4300, South Africa
- School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban, South Africa
| | - Varnika Singh
- South African Sugarcane Research Institute, Private Bag X02, Mount Edgecombe, Durban, 4300, South Africa
- School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban, South Africa
| | - Sandy J Snyman
- South African Sugarcane Research Institute, Private Bag X02, Mount Edgecombe, Durban, 4300, South Africa
- School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban, South Africa
| | - R Stuart Rutherford
- South African Sugarcane Research Institute, Private Bag X02, Mount Edgecombe, Durban, 4300, South Africa
- School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban, South Africa
| | - M Paula Watt
- School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban, South Africa
| |
Collapse
|
315
|
Kassem MM, Christoffersen LB, Cavalli A, Lindorff-Larsen K. Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts. Sci Rep 2018; 8:11112. [PMID: 30042380 PMCID: PMC6057941 DOI: 10.1038/s41598-018-29357-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 07/10/2018] [Indexed: 11/29/2022] Open
Abstract
Based on the development of new algorithms and growth of sequence databases, it has recently become possible to build robust higher-order sequence models based on sets of aligned protein sequences. Such models have proven useful in de novo structure prediction, where the sequence models are used to find pairs of residues that co-vary during evolution, and hence are likely to be in spatial proximity in the native protein. The accuracy of these algorithms, however, drop dramatically when the number of sequences in the alignment is small. We have developed a method that we termed CE-YAPP (CoEvolution-YAPP), that is based on YAPP (Yet Another Peak Processor), which has been shown to solve a similar problem in NMR spectroscopy. By simultaneously performing structure prediction and contact assignment, CE-YAPP uses structural self-consistency as a filter to remove false positive contacts. Furthermore, CE-YAPP solves another problem, namely how many contacts to choose from the ordered list of covarying amino acid pairs. We show that CE-YAPP consistently improves contact prediction from multiple sequence alignments, in particular for proteins that are difficult targets. We further show that the structures determined from CE-YAPP are also in better agreement with those determined using traditional methods in structural biology.
Collapse
Affiliation(s)
- Maher M Kassem
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark
| | - Lars B Christoffersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark
| | - Andrea Cavalli
- Institute for Research in Biomedicine, Università della Svizzera italiana (USI), Via Vincenzo Vela 6, 6500, Bellinzona, Switzerland.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark.
| |
Collapse
|
316
|
de Oliveira SHP, Shi J, Deane CM. Comparing co-evolution methods and their application to template-free protein structure prediction. Bioinformatics 2018; 33:373-381. [PMID: 28171606 PMCID: PMC5860252 DOI: 10.1093/bioinformatics/btw618] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 09/19/2016] [Accepted: 09/22/2016] [Indexed: 02/01/2023] Open
Abstract
Motivation Co-evolution methods have been used as contact predictors to identify pairs of residues that share spatial proximity. Such contact predictors have been compared in terms of the precision of their predictions, but there is no study that compares their usefulness to model generation. Results We compared eight different co-evolution methods for a set of ∼3500 proteins and found that metaPSICOV stage 2 produces, on average, the most precise predictions. Precision of all the methods is dependent on SCOP class, with most methods predicting contacts in all α and membrane proteins poorly. The contact predictions were then used to assist in de novo model generation. We found that it was not the method with the highest average precision, but rather metaPSICOV stage 1 predictions that consistently led to the best models being produced. Our modelling results show a correlation between the proportion of predicted long range contacts that are satisfied on a model and its quality. We used this proportion to effectively classify models as correct/incorrect; discarding decoys classified as incorrect led to an enrichment in the proportion of good decoys in our final ensemble by a factor of seven. For 17 out of the 18 cases where correct answers were generated, the best models were not discarded by this approach. We were also able to identify eight cases where no correct decoy had been generated. Availability and Implementation Data is available for download from: http://opig.stats.ox.ac.uk/resources. Contact saulo.deoliveira@dtc.ox.ac.uk Supplimentary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough SL1 3WE, UK,Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China
| | | |
Collapse
|
317
|
Keasar C, McGuffin LJ, Wallner B, Chopra G, Adhikari B, Bhattacharya D, Blake L, Bortot LO, Cao R, Dhanasekaran BK, Dimas I, Faccioli RA, Faraggi E, Ganzynkowicz R, Ghosh S, Ghosh S, Giełdoń A, Golon L, He Y, Heo L, Hou J, Khan M, Khatib F, Khoury GA, Kieslich C, Kim DE, Krupa P, Lee GR, Li H, Li J, Lipska A, Liwo A, Maghrabi AHA, Mirdita M, Mirzaei S, Mozolewska MA, Onel M, Ovchinnikov S, Shah A, Shah U, Sidi T, Sieradzan AK, Ślusarz M, Ślusarz R, Smadbeck J, Tamamis P, Trieber N, Wirecki T, Yin Y, Zhang Y, Bacardit J, Baranowski M, Chapman N, Cooper S, Defelicibus A, Flatten J, Koepnick B, Popović Z, Zaborowski B, Baker D, Cheng J, Czaplewski C, Delbem ACB, Floudas C, Kloczkowski A, Ołdziej S, Levitt M, Scheraga H, Seok C, Söding J, Vishveshwara S, Xu D, Crivelli SN. An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 2018; 8:9939. [PMID: 29967418 PMCID: PMC6028396 DOI: 10.1038/s41598-018-26812-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/17/2018] [Indexed: 01/14/2023] Open
Abstract
Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.
Collapse
Affiliation(s)
- Chen Keasar
- Department of Computer Science, Ben Gurion University of the Negev, Be'er sheva, Israel
| | - Liam J McGuffin
- Biomedical Sciences Division, School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry, and Biology, Linköping University, Linköping, Sweden
| | - Gaurav Chopra
- Department of Chemistry, College of Science, Purdue University, West Lafayette, IN, USA
- Purdue Institute for Drug Discovery, Purdue University, West Lafayette, IN, USA
- Purdue Center for Cancer Research, Purdue University, West Lafayette, IN, USA
- Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue University, West Lafayette, IN, USA
- Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA
| | - Badri Adhikari
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | - Lauren Blake
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Leandro Oliveira Bortot
- Laboratory of Biological Physics, Faculty of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, São Paulo, Brazil
| | - Renzhi Cao
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - B K Dhanasekaran
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Itzhel Dimas
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Eshel Faraggi
- Research and Information Systems, LLC, Carmel, IN, USA
- Department of Biochemistry and Molecular Biology, IU School of Medicine, Indianapolis, IN, USA
- Batelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA
| | | | - Sambit Ghosh
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Soma Ghosh
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Lukasz Golon
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Yi He
- School of Engineering, University of California, Merced, CA, USA
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Main Khan
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - Firas Khatib
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - George A Khoury
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | - Chris Kieslich
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, USA
| | - David E Kim
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Pawel Krupa
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hongbo Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- School of Computer Science and Information Technology, NorthEast Normal University, Changchun, China
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Jilong Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | | | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Ali Hassan A Maghrabi
- Biomedical Sciences Division, School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Milot Mirdita
- Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Shokoufeh Mirzaei
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- California State Polytechnic University, Pomona, CA, USA
| | | | - Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Anand Shah
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - Utkarsh Shah
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
| | - Tomer Sidi
- Department of Computer Science, Ben Gurion University of the Negev, Be'er sheva, Israel
| | | | | | - Rafal Ślusarz
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | - Phanourios Tamamis
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, USA
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
| | - Nicholas Trieber
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - Tomasz Wirecki
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Yanping Yin
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) research group, School of Computing, Newcastle University, Newcastle-upon-Tyne, UK
| | - Maciej Baranowski
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Gdańsk, Poland
| | - Nicholas Chapman
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Seth Cooper
- College of Computer and Information Science, Northeastern University, Boston, MA, USA
| | - Alexandre Defelicibus
- Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, Brazil
| | - Jeff Flatten
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Brian Koepnick
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Zoran Popović
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | | | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | | | | | | | | | - Stanislaw Ołdziej
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Gdańsk, Poland
| | - Michael Levitt
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Harold Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Johannes Söding
- Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Saraswathi Vishveshwara
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Silvia N Crivelli
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Department of Computer Science, University of California, Davis, CA, USA.
| |
Collapse
|
318
|
Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018; 13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open
Abstract
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts—precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such “informative” contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Qinxin Pan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States of America
- * E-mail:
| |
Collapse
|
319
|
Nerli S, McShan AC, Sgourakis NG. Chemical shift-based methods in NMR structure determination. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2018; 106-107:1-25. [PMID: 31047599 PMCID: PMC6788782 DOI: 10.1016/j.pnmrs.2018.03.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/09/2018] [Accepted: 03/09/2018] [Indexed: 05/08/2023]
Abstract
Chemical shifts are highly sensitive probes harnessed by NMR spectroscopists and structural biologists as conformational parameters to characterize a range of biological molecules. Traditionally, assignment of chemical shifts has been a labor-intensive process requiring numerous samples and a suite of multidimensional experiments. Over the past two decades, the development of complementary computational approaches has bolstered the analysis, interpretation and utilization of chemical shifts for elucidation of high resolution protein and nucleic acid structures. Here, we review the development and application of chemical shift-based methods for structure determination with a focus on ab initio fragment assembly, comparative modeling, oligomeric systems, and automated assignment methods. Throughout our discussion, we point out practical uses, as well as advantages and caveats, of using chemical shifts in structure modeling. We additionally highlight (i) hybrid methods that employ chemical shifts with other types of NMR restraints (residual dipolar couplings, paramagnetic relaxation enhancements and pseudocontact shifts) that allow for improved accuracy and resolution of generated 3D structures, (ii) the utilization of chemical shifts to model the structures of sparsely populated excited states, and (iii) modeling of sidechain conformations. Finally, we briefly discuss the advantages of contemporary methods that employ sparse NMR data recorded using site-specific isotope labeling schemes for chemical shift-driven structure determination of larger molecules. With this review, we aim to emphasize the accessibility and versatility of chemical shifts for structure determination of challenging biological systems, and to point out emerging areas of development that lead us towards the next generation of tools.
Collapse
Affiliation(s)
- Santrupti Nerli
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, United States; Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA 95064, United States
| | - Andrew C McShan
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, United States
| | - Nikolaos G Sgourakis
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, United States.
| |
Collapse
|
320
|
Szurmant H, Weigt M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 2018; 50:26-32. [PMID: 29101847 PMCID: PMC5940578 DOI: 10.1016/j.sbi.2017.10.014] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 10/18/2022]
Abstract
Interacting proteins coevolve at multiple but interconnected scales, from the residue-residue over the protein-protein up to the family-family level. The recent accumulation of enormous amounts of sequence data allows for the development of novel, data-driven computational approaches. Notably, these approaches can bridge scales within a single statistical framework. Although being currently applied mostly to isolated problems on single scales, their immense potential for an evolutionary informed, structural systems biology is steadily emerging.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA 91766, USA.
| | - Martin Weigt
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative - Institut de Biologie Paris Seine, 75005 Paris, France.
| |
Collapse
|
321
|
Puranen S, Pesonen M, Pensar J, Xu YY, Lees JA, Bentley SD, Croucher NJ, Corander J. SuperDCA for genome-wide epistasis analysis. Microb Genom 2018; 4. [PMID: 29813016 PMCID: PMC6096938 DOI: 10.1099/mgen.0.000184] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104-105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
Collapse
Affiliation(s)
- Santeri Puranen
- 2Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland.,1Department of Computer Science, Aalto University, FI-00076 Espoo, Finland
| | - Maiju Pesonen
- 1Department of Computer Science, Aalto University, FI-00076 Espoo, Finland.,2Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
| | - Johan Pensar
- 2Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
| | - Ying Ying Xu
- 1Department of Computer Science, Aalto University, FI-00076 Espoo, Finland.,2Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
| | - John A Lees
- 3Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| | - Stephen D Bentley
- 3Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| | - Nicholas J Croucher
- 4Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, UK
| | - Jukka Corander
- 5Department of Biostatistics, University of Oslo, 0317 Oslo, Norway.,2Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland.,3Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| |
Collapse
|
322
|
Zhao J, Krystofiak ES, Ballesteros A, Cui R, Van Itallie CM, Anderson JM, Fenollar-Ferrer C, Kachar B. Multiple claudin-claudin cis interfaces are required for tight junction strand formation and inherent flexibility. Commun Biol 2018; 1:50. [PMID: 30271933 PMCID: PMC6123731 DOI: 10.1038/s42003-018-0051-5] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 04/03/2018] [Indexed: 02/07/2023] Open
Abstract
Tight junctions consist of a network of sealing strands that create selective ion permeability barriers between adjoining epithelial or endothelial cells. The current model for tight junction strands consists of paired rows of claudins (Cldn) coupled by a cis interface (X-1) derived from crystalline Cldn15. Here we show that tight junction strands exhibit a broad range of lateral bending, indicating diversity in cis interactions. By combining protein–protein docking, coevolutionary analysis, molecular dynamics, and a mutagenesis screen, we identify a new Cldn–Cldn cis interface (Cis-1) that shares interacting residues with X-1 but has an ~ 17° lateral rotation between monomers. In addition, we found that a missense mutation in a Cldn14 that causes deafness and contributes stronger to Cis-1 than to X-1 prevents strand formation in cultured cells. Our results suggest that Cis-1 contributes to the inherent structural flexibility of tight junction strands and is required for maintaining permeability barrier function and hearing. Jun Zhao, Evan S. Krystofiak, and colleagues identified a new cis interface (Cis-1) essential for the formation of normal tight junctions. This study suggests that Cis-1 contributes to maintaining structural flexibility of tight junction strands for proper ion balance and hearing.
Collapse
Affiliation(s)
- Jun Zhao
- Laboratory of Cell Structure and Dynamics, National Institute on Deafness and Other Communication Disorders, 35A Convent Drive, Bethesda, MD, 20892, USA.,Cancer and Inflammation Program, National Cancer Institute, Frederick, MD, 21702, USA
| | - Evan S Krystofiak
- Laboratory of Cell Structure and Dynamics, National Institute on Deafness and Other Communication Disorders, 35A Convent Drive, Bethesda, MD, 20892, USA
| | - Angela Ballesteros
- Laboratory of Cell Structure and Dynamics, National Institute on Deafness and Other Communication Disorders, 35A Convent Drive, Bethesda, MD, 20892, USA.,Molecular Physiology and Biophysics Section, National Institute of Neurological Disorders and Stroke, Bethesda, MD, 20892, USA
| | - Runjia Cui
- Laboratory of Cell Structure and Dynamics, National Institute on Deafness and Other Communication Disorders, 35A Convent Drive, Bethesda, MD, 20892, USA
| | - Christina M Van Itallie
- Laboratory of Tight Junction Structure and Function, National Heart, Lung, and Blood Institute, 50 South Drive, Bethesda, MD, 20892, USA
| | - James M Anderson
- Laboratory of Tight Junction Structure and Function, National Heart, Lung, and Blood Institute, 50 South Drive, Bethesda, MD, 20892, USA
| | - Cristina Fenollar-Ferrer
- Computational Structural Biology Unit, National Institute of Neurological Disorders and Stroke, 35 Convent Drive, Bethesda, MD, 20892, USA. .,Laboratory of Molecular & Cellular Neurobiology, National Institute of Mental Health, 35 Convent Drive, Bethesda, MD, 20892, USA.
| | - Bechara Kachar
- Laboratory of Cell Structure and Dynamics, National Institute on Deafness and Other Communication Disorders, 35A Convent Drive, Bethesda, MD, 20892, USA.
| |
Collapse
|
323
|
Tian P, Louis JM, Baber JL, Aniana A, Best RB. Co-Evolutionary Fitness Landscapes for Sequence Design. Angew Chem Int Ed Engl 2018; 57:5674-5678. [PMID: 29512300 PMCID: PMC6147258 DOI: 10.1002/anie.201713220] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Indexed: 11/10/2022]
Abstract
Efficient and accurate models to predict the fitness of a sequence would be extremely valuable in protein design. We have explored the use of statistical potentials for the coevolutionary fitness landscape, extracted from known protein sequences, in conjunction with Monte Carlo simulations, as a tool for design. As proof of principle, we created a series of predicted high-fitness sequences for three different protein folds, representative of different structural classes: the GA (all-α) and GB (α/β) binding domains of streptococcal protein G, and an SH3 (all-β) domain. We found that most of the designed proteins can fold stably to the target structure, and a structure for a representative of each for GA, GB and SH3 was determined. Several of our designed proteins were also able to bind to native ligands, in some cases with higher affinity than wild-type. Thus, a search using a statistical fitness landscape is a remarkably effective tool for finding novel stable protein sequences.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - John M. Louis
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - James L. Baber
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - Annie Aniana
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| |
Collapse
|
324
|
Tian P, Louis JM, Baber JL, Aniana A, Best RB. Co-Evolutionary Fitness Landscapes for Sequence Design. Angew Chem Int Ed Engl 2018. [DOI: 10.1002/ange.201713220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - John M. Louis
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - James L. Baber
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - Annie Aniana
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - Robert B. Best
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| |
Collapse
|
325
|
Mao W, Wang T, Zhang W, Gong H. Identification of residue pairing in interacting β-strands from a predicted residue contact map. BMC Bioinformatics 2018; 19:146. [PMID: 29673311 PMCID: PMC5907701 DOI: 10.1186/s12859-018-2150-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 04/09/2018] [Indexed: 12/04/2022] Open
Abstract
Background Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map. Results Our algorithm RDb2C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb2C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb2C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb2C. Conclusion Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins. Availability All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C. Electronic supplementary material The online version of this article (10.1186/s12859-018-2150-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
| | - Tong Wang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
| | - Wenxuan Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China.
| |
Collapse
|
326
|
Gil N, Fiser A. Identifying functionally informative evolutionary sequence profiles. Bioinformatics 2018; 34:1278-1286. [PMID: 29211823 PMCID: PMC5905606 DOI: 10.1093/bioinformatics/btx779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023] Open
Abstract
Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. Contact andras.fiser@einstein.yu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
327
|
Michel M, Menéndez Hurtado D, Uziela K, Elofsson A. Large-scale structure prediction by improved contact predictions and model quality assessment. Bioinformatics 2018; 33:i23-i29. [PMID: 28881974 PMCID: PMC5870574 DOI: 10.1093/bioinformatics/btx239] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. Results We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these, 415 have not been reported before. Availability and Implementation Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/. All programs used here are freely available.
Collapse
Affiliation(s)
- Mirco Michel
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - David Menéndez Hurtado
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Karolis Uziela
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Arne Elofsson
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| |
Collapse
|
328
|
Xia Y, Fischer AW, Teixeira P, Weiner B, Meiler J. Integrated Structural Biology for α-Helical Membrane Protein Structure Determination. Structure 2018; 26:657-666.e2. [PMID: 29526436 PMCID: PMC5884713 DOI: 10.1016/j.str.2018.02.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 06/14/2017] [Accepted: 02/05/2018] [Indexed: 01/12/2023]
Abstract
While great progress has been made, only 10% of the nearly 1,000 integral, α-helical, multi-span membrane protein families are represented by at least one experimentally determined structure in the PDB. Previously, we developed the algorithm BCL::MP-Fold, which samples the large conformational space of membrane proteins de novo by assembling predicted secondary structure elements guided by knowledge-based potentials. Here, we present a case study of rhodopsin fold determination by integrating sparse and/or low-resolution restraints from multiple experimental techniques including electron microscopy, electron paramagnetic resonance spectroscopy, and nuclear magnetic resonance spectroscopy. Simultaneous incorporation of orthogonal experimental restraints not only significantly improved the sampling accuracy but also allowed identification of the correct fold, which is demonstrated by a protein size-normalized transmembrane root-mean-square deviation as low as 1.2 Å. The protocol developed in this case study can be used for the determination of unknown membrane protein folds when limited experimental restraints are available.
Collapse
Affiliation(s)
- Yan Xia
- Department of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, Nashville, TN 37232, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Axel W Fischer
- Department of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, Nashville, TN 37232, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Pedro Teixeira
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Brian Weiner
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, Nashville, TN 37232, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA.
| |
Collapse
|
329
|
Gaalswyk K, Muniyat MI, MacCallum JL. The emerging role of physical modeling in the future of structure determination. Curr Opin Struct Biol 2018; 49:145-153. [DOI: 10.1016/j.sbi.2018.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 03/04/2018] [Accepted: 03/05/2018] [Indexed: 10/17/2022]
|
330
|
de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics 2018; 34:1132-1140. [PMID: 29136098 PMCID: PMC6030820 DOI: 10.1093/bioinformatics/btx722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 09/22/2017] [Accepted: 11/04/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. Results We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Availability and implementation Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. Contact saulo.deoliveira@dtc.ox.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Eleanor C Law
- Department of Statistics, University of Oxford, Oxford, UK
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough, UK
- Division of Physical Biology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
331
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
332
|
Chakravarty S, Ung AR, Moore B, Shore J, Alshamrani M. A Comprehensive Analysis of Anion-Quadrupole Interactions in Protein Structures. Biochemistry 2018; 57:1852-1867. [PMID: 29482321 PMCID: PMC6051350 DOI: 10.1021/acs.biochem.7b01006] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The edgewise interactions of anions with phenylalanine (Phe) aromatic rings in proteins, known as anion-quadrupole interactions, have been well studied. However, the anion-quadrupole interactions of the tyrosine (Tyr) and tryptophan (Trp) rings have been less well studied, probably because these have been considered weaker than interactions of anions hydrogen bonded to Trp/Tyr side chains. Distinguishing such hydrogen bonding interactions, we comprehensively surveyed the edgewise interactions of certain anions (aspartate, glutamate, and phosphate) with Trp, Tyr, and Phe rings in high-resolution, nonredundant protein single chains and interfaces (protein-protein, DNA/RNA-protein, and membrane-protein). Trp/Tyr anion-quadrupole interactions are common, with Trp showing the highest propensity and average interaction energy for this type of interaction. The energy of an anion-quadrupole interaction (-15.0 to 0.0 kcal/mol, based on quantum mechanical calculations) depends not only on the interaction geometry but also on the ring atom. The phosphate anions at DNA/RNA-protein interfaces interact with aromatic residues with energies comparable to that of aspartate/glutamate anion-quadrupole interactions. At DNA-protein interfaces, the frequency of aromatic ring participation in anion-quadrupole interactions is comparable to that of positive charge participation in salt bridges, suggesting an underappreciated role for anion-quadrupole interactions at DNA-protein (or membrane-protein) interfaces. Although less frequent than salt bridges in single-chain proteins, we observed highly conserved anion-quadrupole interactions in the structures of remote homologues, and evolutionary covariance-based residue contact score predictions suggest that conserved anion-quadrupole interacting pairs, like salt bridges, contribute to polypeptide folding, stability, and recognition.
Collapse
Affiliation(s)
- Suvobrata Chakravarty
- Chemistry & Biochemistry, South Dakota State University, Brookings, SD, USA, 57007
- BioSNTR, Brookings, SD, USA, 57007
| | - Adron R. Ung
- Chemistry & Biochemistry, South Dakota State University, Brookings, SD, USA, 57007
| | - Brian Moore
- University Networking and Research Computing, South Dakota State University, Brookings, SD, USA, 57007
| | - Jay Shore
- Chemistry & Biochemistry, South Dakota State University, Brookings, SD, USA, 57007
| | - Mona Alshamrani
- Chemistry & Biochemistry, South Dakota State University, Brookings, SD, USA, 57007
| |
Collapse
|
333
|
He B, Mortuza SM, Wang Y, Shen HB, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2018; 33:2296-2306. [PMID: 28369334 DOI: 10.1093/bioinformatics/btx164] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/21/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction. Results We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles. Availiablity and Implementation On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ . Contact zhng@umich.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoji He
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.,School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.,School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
334
|
Park H, Ovchinnikov S, Kim DE, DiMaio F, Baker D. Protein homology model refinement by large-scale energy optimization. Proc Natl Acad Sci U S A 2018; 115:3054-3059. [PMID: 29507254 PMCID: PMC5866580 DOI: 10.1073/pnas.1719115115] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.
Collapse
Affiliation(s)
- Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98105
| | - David E Kim
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105;
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| |
Collapse
|
335
|
Abstract
Although many putative heme transporters have been discovered, it has been challenging to prove that these proteins are directly involved with heme trafficking in vivo and to identify their heme binding domains. The prokaryotic pathways for cytochrome c biogenesis, Systems I and II, transport heme from inside the cell to outside for stereochemical attachment to cytochrome c, making them excellent models to study heme trafficking. System I is composed of eight integral membrane proteins (CcmA-H) and is proposed to transport heme via CcmC to an external "WWD" domain for presentation to the membrane-tethered heme chaperone, CcmE. Herein, we develop a new cysteine/heme crosslinking approach to trap and map endogenous heme in CcmC (WWD domain) and CcmE (defining "2-vinyl" and "4-vinyl" pockets for heme). Crosslinking occurs when either of the two vinyl groups of heme localize near a thiol of an engineered cysteine residue. Double crosslinking, whereby both vinyls crosslink to two engineered cysteines, facilitated a more detailed structural mapping of the heme binding sites, including stereospecificity. Using heme crosslinking results, heme ligand identification, and genomic coevolution data, we model the structure of the CcmCDE complex, including the WWD heme binding domain. We conclude that CcmC trafficks heme via its WWD domain and propose the structural basis for stereochemical attachment of heme.
Collapse
|
336
|
Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018; 86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]
Abstract
Following up on the encouraging results of residue-residue contact prediction in the CASP11 experiment, we present the analysis of predictions submitted for CASP12. The submissions include predictions of 34 groups for 38 domains classified as free modeling targets which are not accessible to homology-based modeling due to a lack of structural templates. CASP11 saw a rise of coevolution-based methods outperforming other approaches. The improvement of these methods coupled to machine learning and sequence database growth are most likely the main driver for a significant improvement in average precision from 27% in CASP11 to 47% in CASP12. In more than half of the targets, especially those with many homologous sequences accessible, precisions above 90% were achieved with the best predictors reaching a precision of 100% in some cases. We furthermore tested the impact of using these contacts as restraints in ab initio modeling of 14 single-domain free modeling targets using Rosetta. Adding contacts to the Rosetta calculations resulted in improvements of up to 26% in GDT_TS within the top five structures.
Collapse
Affiliation(s)
- Joerg Schaarschmidt
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| | | | | | - Alexandre M.J.J. Bonvin
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| |
Collapse
|
337
|
Vu PJ, Yao XQ, Momin M, Hamelberg D. Unraveling Allosteric Mechanisms of Enzymatic Catalysis with an Evolutionary Analysis of Residue–Residue Contact Dynamical Changes. ACS Catal 2018. [DOI: 10.1021/acscatal.7b04263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Phuoc Jake Vu
- Department of Chemistry, Georgia State University, Atlanta, Georgia 30303-2515, United States
| | - Xin-Qiu Yao
- Department of Chemistry, Georgia State University, Atlanta, Georgia 30303-2515, United States
| | - Mohamed Momin
- Department of Chemistry, Georgia State University, Atlanta, Georgia 30303-2515, United States
| | - Donald Hamelberg
- Department of Chemistry, Georgia State University, Atlanta, Georgia 30303-2515, United States
| |
Collapse
|
338
|
dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018; 34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ricardo N dos Santos
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| | | | | | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, USA
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| |
Collapse
|
339
|
Barrat-Charlaix P, Weigt M. [From sequence variability to structural and functional prediction: modeling of homologous protein families]. Biol Aujourdhui 2018; 211:239-244. [PMID: 29412135 DOI: 10.1051/jbio/2017030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Indexed: 06/08/2023]
Abstract
Thanks to next-generation sequencing, the number of sequenced genomes grows rapidly, providing in particular ample examples for the sequence variability between homologous proteins. This article discusses data-driven probabilistic sequence models, which are able to extract a multitude of information from sequence data alone, including (i) structural features like residue-residue contacts, which are formed in the folded protein, (ii) protein-protein interaction interfaces and (iii) phenotypic effects of amino-acid substitutions in proteins.
Collapse
Affiliation(s)
- Pierre Barrat-Charlaix
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative, Institut de Biologie Paris Seine, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative, Institut de Biologie Paris Seine, 75005 Paris, France
| |
Collapse
|
340
|
Condon SGF, Mahbuba DA, Armstrong CR, Diaz-Vazquez G, Craven SJ, LaPointe LM, Khadria AS, Chadda R, Crooks JA, Rangarajan N, Weibel DB, Hoskins AA, Robertson JL, Cui Q, Senes A. The FtsLB subcomplex of the bacterial divisome is a tetramer with an uninterrupted FtsL helix linking the transmembrane and periplasmic regions. J Biol Chem 2018; 293:1623-1641. [PMID: 29233891 PMCID: PMC5798294 DOI: 10.1074/jbc.ra117.000426] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Revised: 12/04/2017] [Indexed: 11/06/2022] Open
Abstract
In Escherichia coli, FtsLB plays a central role in the initiation of cell division, possibly transducing a signal that will eventually lead to the activation of peptidoglycan remodeling at the forming septum. The molecular mechanisms by which FtsLB operates in the divisome, however, are not understood. Here, we present a structural analysis of the FtsLB complex, performed with biophysical, computational, and in vivo methods, that establishes the organization of the transmembrane region and proximal coiled coil of the complex. FRET analysis in vitro is consistent with formation of a tetramer composed of two FtsL and two FtsB subunits. We predicted subunit contacts through co-evolutionary analysis and used them to compute a structural model of the complex. The transmembrane region of FtsLB is stabilized by hydrophobic packing and by a complex network of hydrogen bonds. The coiled coil domain probably terminates near the critical constriction control domain, which might correspond to a structural transition. The presence of strongly polar amino acids within the core of the tetrameric coiled coil suggests that the coil may split into two independent FtsQ-binding domains. The helix of FtsB is interrupted between the transmembrane and coiled coil regions by a flexible Gly-rich linker. Conversely, the data suggest that FtsL forms an uninterrupted helix across the two regions and that the integrity of this helix is indispensable for the function of the complex. The FtsL helix is thus a candidate for acting as a potential mechanical connection to communicate conformational changes between periplasmic, membrane, and cytoplasmic regions.
Collapse
Affiliation(s)
- Samson G F Condon
- From the Department of Biochemistry
- the Integrated Program in Biochemistry
| | - Deena-Al Mahbuba
- From the Department of Biochemistry
- the Integrated Program in Biochemistry
| | | | | | - Samuel J Craven
- From the Department of Biochemistry
- the Integrated Program in Biochemistry
| | - Loren M LaPointe
- From the Department of Biochemistry
- the Integrated Program in Biochemistry
| | - Ambalika S Khadria
- From the Department of Biochemistry
- the Integrated Program in Biochemistry
| | - Rahul Chadda
- the Department of Molecular Physiology and Biophysics, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242
| | - John A Crooks
- From the Department of Biochemistry
- the Integrated Program in Biochemistry
| | | | | | | | - Janice L Robertson
- the Department of Molecular Physiology and Biophysics, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242
| | - Qiang Cui
- the Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706 and
| | | |
Collapse
|
341
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
342
|
Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst 2018; 6:65-74.e3. [PMID: 29275173 PMCID: PMC5808454 DOI: 10.1016/j.cels.2017.11.014] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 10/04/2017] [Accepted: 11/22/2017] [Indexed: 12/21/2022]
Abstract
While genes are defined by sequence, in biological systems a protein's function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction.
Collapse
Affiliation(s)
- Yang Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
| | - Perry Palmedo
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA; Division of Medical Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Qing Ye
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
343
|
Prediction of Structures and Interactions from Genome Information. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:123-152. [DOI: 10.1007/978-981-13-2200-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
344
|
Salinas VH, Ranganathan R. Coevolution-based inference of amino acid interactions underlying protein function. eLife 2018; 7:34300. [PMID: 30024376 PMCID: PMC6117156 DOI: 10.7554/elife.34300] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 07/18/2018] [Indexed: 02/02/2023] Open
Abstract
Protein function arises from a poorly understood pattern of energetic interactions between amino acid residues. Sequence-based strategies for deducing this pattern have been proposed, but lack of benchmark data has limited experimental verification. Here, we extend deep-mutation technologies to enable measurement of many thousands of pairwise amino acid couplings in several homologs of a protein family - a deep coupling scan (DCS). The data show that cooperative interactions between residues are loaded in a sparse, evolutionarily conserved, spatially contiguous network of amino acids. The pattern of amino acid coupling is quantitatively captured in the coevolution of amino acid positions, especially as indicated by the statistical coupling analysis (SCA), providing experimental confirmation of the key tenets of this method. This work exposes the collective nature of physical constraints on protein function and clarifies its link with sequence analysis, enabling a general practical approach for understanding the structural basis for protein function.
Collapse
Affiliation(s)
- Victor H Salinas
- Green Center for Systems BiologyUT Southwestern Medical CenterDallasUnited States
| | - Rama Ranganathan
- Center for Physics of Evolving Systems, Biochemistry and Molecular BiologyThe University of ChicagoChicagoUnited States,Institute for Molecular EngineeringThe University of ChicagoChicagoUnited States
| |
Collapse
|
345
|
Huang YJ, Brock KP, Sander C, Marks DS, Montelione GT. A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:153-169. [PMID: 30617828 DOI: 10.1007/978-981-13-2200-6_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
While 3D structure determination of small (<15 kDa) proteins by solution NMR is largely automated and routine, structural analysis of larger proteins is more challenging. An emerging hybrid strategy for modeling protein structures combines sparse NMR data that can be obtained for larger proteins with sequence co-variation data, called evolutionary couplings (ECs), obtained from multiple sequence alignments of protein families. This hybrid "EC-NMR" method can be used to accurately model larger (15-60 kDa) proteins, and more rapidly determine structures of smaller (5-15 kDa) proteins using only backbone NMR data. The resulting structures have accuracies relative to reference structures comparable to those obtained with full backbone and sidechain NMR resonance assignments. The requirement that evolutionary couplings (ECs) are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, potentially also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.
Collapse
Affiliation(s)
- Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Kelly P Brock
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
346
|
Suplatov D, Sharapova Y, Timonina D, Kopylov K, Švedas V. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families. J Bioinform Comput Biol 2017; 16:1840005. [PMID: 29361894 DOI: 10.1142/s021972001840005x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
Collapse
Affiliation(s)
- Dmitry Suplatov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Yana Sharapova
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Daria Timonina
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Kirill Kopylov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Vytas Švedas
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| |
Collapse
|
347
|
Somody JC, MacKinnon SS, Windemuth A. Structural coverage of the proteome for pharmaceutical applications. Drug Discov Today 2017; 22:1792-1799. [DOI: 10.1016/j.drudis.2017.08.004] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 08/16/2017] [Accepted: 08/17/2017] [Indexed: 01/09/2023]
|
348
|
Patterns of coevolving amino acids unveil structural and dynamical domains. Proc Natl Acad Sci U S A 2017; 114:E10612-E10621. [PMID: 29183970 DOI: 10.1073/pnas.1712021114] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Patterns of interacting amino acids are so preserved within protein families that the sole analysis of evolutionary comutations can identify pairs of contacting residues. It is also known that evolution conserves functional dynamics, i.e., the concerted motion or displacement of large protein regions or domains. Is it, therefore, possible to use a pure sequence-based analysis to identify these dynamical domains? To address this question, we introduce here a general coevolutionary coupling analysis strategy and apply it to a curated sequence database of hundreds of protein families. For most families, the sequence-based method partitions amino acids into a few clusters. When viewed in the context of the native structure, these clusters have the signature characteristics of viable protein domains: They are spatially separated but individually compact. They have a direct functional bearing too, as shown for various reference cases. We conclude that even large-scale structural and functionally related properties can be recovered from inference methods applied to evolutionary-related sequences. The method introduced here is available as a software package and web server (spectrus.sissa.it/spectrus-evo_webserver).
Collapse
|
349
|
Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 2017; 86 Suppl 1:136-151. [PMID: 29082551 DOI: 10.1002/prot.25414] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/09/2017] [Accepted: 10/27/2017] [Indexed: 12/26/2022]
Abstract
We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of β-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
350
|
Schmidt M, Hamacher K. Three-body interactions improve contact prediction within direct-coupling analysis. Phys Rev E 2017; 96:052405. [PMID: 29347718 DOI: 10.1103/physreve.96.052405] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Indexed: 11/07/2022]
Abstract
The prediction of residue contacts in a protein solely from sequence information is a promising approach to computational structure prediction. Recent developments use statistical or information theoretic methods to extract contact information from a multiple sequence alignment. Despite good results, accuracy is limited due to usage of two-body interactions within a Potts model. In this paper we generalize this approach and propose a Hamiltonian with an additional three-body interaction term. We derive a mean-field approximation for inference of three-body couplings within a Potts model which is fast enough on modern computers. Finally, we show that our model has a higher accuracy in predicting residue contacts in comparison with the plain two-body-interaction model.
Collapse
Affiliation(s)
- Michael Schmidt
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| | - Kay Hamacher
- Department of Biology and Department of Computer Science and Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| |
Collapse
|