Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011;28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011;28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

351

Perez A, Morrone JA, Dill KA. Accelerating physical simulations of proteins by leveraging external knowledge. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2017;7. [PMID: 28959358 DOI: 10.1002/wcms.1309] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

352

Mandloi S, Chakrabarti S. Protein sites with more coevolutionary connections tend to evolve slower, while more variable protein families acquire higher coevolutionary connections. F1000Res 2017;6:453. [PMID: 28751967 PMCID: PMC5506539 DOI: 10.12688/f1000research.11251.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/05/2017] [Indexed: 11/20/2022] Open

353

Uguzzoni G, John Lovis S, Oteri F, Schug A, Szurmant H, Weigt M. Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis. Proc Natl Acad Sci U S A 2017;114:E2662-E2671. [PMID: 28289198 PMCID: PMC5380090 DOI: 10.1073/pnas.1615068114] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Abstract

Proteins have evolved to perform diverse cellular functions, from serving as reaction catalysts to coordinating cellular propagation and development. Frequently, proteins do not exert their full potential as monomers but rather undergo concerted interactions as either homo-oligomers or with other proteins as hetero-oligomers. The experimental study of such protein complexes and interactions has been arduous. Theoretical structure prediction methods are an attractive alternative. Here, we investigate homo-oligomeric interfaces by tracing residue coevolution via the global statistical direct coupling analysis (DCA). DCA can accurately infer spatial adjacencies between residues. These adjacencies can be included as constraints in structure prediction techniques to predict high-resolution models. By taking advantage of the ongoing exponential growth of sequence databases, we go significantly beyond anecdotal cases of a few protein families and apply DCA to a systematic large-scale study of nearly 2,000 Pfam protein families with sufficient sequence information and structurally resolved homo-oligomeric interfaces. We find that large interfaces are commonly identified by DCA. We further demonstrate that DCA can differentiate between subfamilies with different binding modes within one large Pfam family. Sequence-derived contact information for the subfamilies proves sufficient to assemble accurate structural models of the diverse protein-oligomers. Thus, we provide an approach to investigate oligomerization for arbitrary protein families leading to structural models complementary to often-difficult experimental methods. Combined with ever more abundant sequential data, we anticipate that this study will be instrumental to allow the structural description of many heteroprotein complexes in the future.

Collapse

354

Skwark MJ, Croucher NJ, Puranen S, Chewapreecha C, Pesonen M, Xu YY, Turner P, Harris SR, Beres SB, Musser JM, Parkhill J, Bentley SD, Aurell E, Corander J. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis. PLoS Genet 2017;13:e1006508. [PMID: 28207813 PMCID: PMC5312804 DOI: 10.1371/journal.pgen.1006508] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 11/24/2016] [Indexed: 12/05/2022] Open

Abstract

Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for targeted experimental work.

Epistatic interactions between polymorphisms in DNA are recognized as important drivers of evolution in numerous organisms. Study of epistasis in bacteria has been hampered by the lack of densely sampled population genomic data, suitable statistical models and inference algorithms sufficiently powered for extremely high-dimensional parameter spaces. We introduce the first model-based method for genome-wide epistasis analysis and use two of the largest available bacterial population genome data sets on Streptococcus pneumoniae (the pneumococcus) and Streptococcus pyogenes (group A Streptococcus) to demonstrate its potential for biological discovery. Our approach reveals interacting networks of resistance, virulence and core machinery genes in the pneumococcus, which highlights putative candidates for novel drug targets. We also discover a number of plausible targets of co-selection in S. pyogenes linked to RNA pseudouridine synthases. Our method significantly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for targeted experimental work.

Collapse

Affiliation(s)

Marcin J Skwark Department of Chemistry, Vanderbilt University, Nashville, TN, United States of America
Nicholas J Croucher Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
Santeri Puranen Department of Computer Science, Aalto University, Espoo, Finland
Claire Chewapreecha Department of Medicine, University of Cambridge, Cambridge, United Kingdom
Maiju Pesonen Department of Computer Science, Aalto University, Espoo, Finland
Ying Ying Xu Department of Computer Science, Aalto University, Espoo, Finland
Paul Turner Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand.,Centre for Tropical Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
Simon R Harris Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Stephen B Beres Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, and Houston Methodist Hospital, Houston, Texas, United States of America
James M Musser Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, and Houston Methodist Hospital, Houston, Texas, United States of America.,Departments of Pathology and Laboratory Medicine and Microbiology and Immunology, Weill Cornell Medical College, New York, New York, United States of America
Julian Parkhill Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Stephen D Bentley Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Erik Aurell Department of Computational Biology, KTH-Royal Institute of Technology, Stockholm, Sweden.,Departments of Applied Physics and Computer Science, Aalto University, Espoo, Finland.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
Jukka Corander Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.,Department of Biostatistics, University of Oslo, Oslo, Norway.,Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom

Collapse

355

Várnai C, Burkoff NS, Wild DL. Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs. PLoS One 2017;12:e0169356. [PMID: 28166227 PMCID: PMC5293240 DOI: 10.1371/journal.pone.0169356] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 12/15/2016] [Indexed: 01/05/2023] Open

356

Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CP, Springer M, Sander C, Marks DS. Mutation effects predicted from sequence co-variation. Nat Biotechnol 2017;35:128-135. [PMID: 28092658 PMCID: PMC5383098 DOI: 10.1038/nbt.3769] [Citation(s) in RCA: 449] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 12/09/2016] [Indexed: 01/09/2023]

357

Ovchinnikov S, Park H, Varghese N, Huang PS, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D. Protein structure determination using metagenome sequence data. Science 2017;355:294-298. [PMID: 28104891 PMCID: PMC5493203 DOI: 10.1126/science.aah4043] [Citation(s) in RCA: 351] [Impact Index Per Article: 43.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 11/22/2016] [Indexed: 01/30/2023]

358

Lim K, Yamada KD, Frith MC, Tomii K. Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix. ACTA ACUST UNITED AC 2017;17:147-154. [PMID: 28083762 PMCID: PMC5274646 DOI: 10.1007/s10969-016-9210-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 12/05/2016] [Indexed: 12/28/2022]

359

da Fonseca NJ, Lima Afonso MQ, Pedersolli NG, de Oliveira LC, Andrade DS, Bleicher L. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains. Biochem Biophys Res Commun 2017;492:565-571. [PMID: 28087275 DOI: 10.1016/j.bbrc.2017.01.041] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 01/09/2017] [Indexed: 10/20/2022]

360

Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 2017;13:e1005324. [PMID: 28056090 PMCID: PMC5249242 DOI: 10.1371/journal.pcbi.1005324] [Citation(s) in RCA: 589] [Impact Index Per Article: 73.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/20/2017] [Accepted: 12/20/2016] [Indexed: 12/02/2022] Open

Abstract

Motivation

Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction.

Method

This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question.

Results

Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then.

Availability

http://raptorx.uchicago.edu/ContactMap/

Protein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurrence patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained mostly with soluble proteins, our method performs very well on membrane proteins. Recent blind CAMEO test confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.

Collapse

361

Wei Q, La D, Kihara D. BindML/BindML+: Detecting Protein-Protein Interaction Interface Propensity from Amino Acid Substitution Patterns. Methods Mol Biol 2017;1529:279-289. [PMID: 27914057 DOI: 10.1007/978-1-4939-6637-0_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

362

Vaitinadapoule A, Etchebest C. Molecular Modeling of Transporters: From Low Resolution Cryo-Electron Microscopy Map to Conformational Exploration. The Example of TSPO. Methods Mol Biol 2017;1635:383-416. [PMID: 28755381 DOI: 10.1007/978-1-4939-7151-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

363

Rawi R, Mall R, Kunji K, El Anbari M, Aupetit M, Ullah E, Bensmail H. COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator. BMC Bioinformatics 2016;17:533. [PMID: 27978812 PMCID: PMC5159955 DOI: 10.1186/s12859-016-1400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/01/2016] [Indexed: 11/13/2022] Open

Abstract

Background

The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso.

Results

Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions.

Conclusion

We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users.

Collapse

364

Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proc Natl Acad Sci U S A 2016;113:15018-15023. [PMID: 27965389 DOI: 10.1073/pnas.1611861114] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

365

Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J. ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics 2016;17:517. [PMID: 27923350 PMCID: PMC5142288 DOI: 10.1186/s12859-016-1404-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/01/2016] [Indexed: 12/31/2022] Open

366

Alcock F, Stansfeld PJ, Basit H, Habersetzer J, Baker MA, Palmer T, Wallace MI, Berks BC. Assembling the Tat protein translocase. eLife 2016;5. [PMID: 27914200 PMCID: PMC5201420 DOI: 10.7554/elife.20718] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 11/29/2016] [Indexed: 12/18/2022] Open

367

Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016;43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]

368

Orlando G, Raimondi D, Vranken WF. Observation selection bias in contact prediction and its implications for structural bioinformatics. Sci Rep 2016;6:36679. [PMID: 27857150 PMCID: PMC5114557 DOI: 10.1038/srep36679] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 10/18/2016] [Indexed: 01/14/2023] Open

369

Taylor WR, Matthews-Palmer TRS, Beeby M. Molecular Models for the Core Components of the Flagellar Type-III Secretion Complex. PLoS One 2016;11:e0164047. [PMID: 27855178 PMCID: PMC5113899 DOI: 10.1371/journal.pone.0164047] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 09/19/2016] [Indexed: 01/10/2023] Open

370

Jacquin H, Rançon A. Resummed mean-field inference for strongly coupled data. Phys Rev E 2016;94:042118. [PMID: 27841631 DOI: 10.1103/physreve.94.042118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Indexed: 11/07/2022]

371

Coucke A, Uguzzoni G, Oteri F, Cocco S, Monasson R, Weigt M. Direct coevolutionary couplings reflect biophysical residue interactions in proteins. J Chem Phys 2016;145:174102. [DOI: 10.1063/1.4966156] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

372

Assessing Predicted Contacts for Building Protein Three-Dimensional Models. Methods Mol Biol 2016. [PMID: 27787823 DOI: 10.1007/978-1-4939-6406-2_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

373

Yu J, Andreani J, Ochsenbein F, Guerois R. Lessons from (co-)evolution in the docking of proteins and peptides for CAPRI Rounds 28-35. Proteins 2016;85:378-390. [PMID: 27701780 DOI: 10.1002/prot.25180] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 08/25/2016] [Accepted: 08/25/2016] [Indexed: 11/06/2022]

374

Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci U S A 2016;113:12186-12191. [PMID: 27729520 DOI: 10.1073/pnas.1607570113] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

375

Inferring interaction partners from protein sequences. Proc Natl Acad Sci U S A 2016;113:12180-12185. [PMID: 27663738 DOI: 10.1073/pnas.1606762113] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

376

Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 2016;84 Suppl 1:131-44. [PMID: 26474083 PMCID: PMC4834069 DOI: 10.1002/prot.24943] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/15/2015] [Accepted: 10/11/2015] [Indexed: 12/27/2022]

377

Li Q, Dahl DB, Vannucci M, Joo H, Tsai JW. KScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure. Bioinformatics 2016;32:3774-3781. [PMID: 27559156 DOI: 10.1093/bioinformatics/btw553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 07/15/2016] [Accepted: 08/18/2016] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a socket and knob, we undertake an investigation of the knob-socket construct's ability to improve the prediction of residue contacts. The statistical model considers three priors and two posterior estimations to better understand how the input data affects predictions. This produces six implementations of KScons that are tested on three sets: PSICOV, CASP10 and CASP11. We compare against the current leading contact prediction methods.

RESULTS

The results demonstrate the usefulness as well as the limits of knob-socket based structural modeling of protein contacts. The construct is able to extract good predictions from known structural homologs, while its performance degrades when no homologs exist. Among our six implementations, KScons MST-MP (which uses the multiple structure alignment prior and marginal posterior incorporating structural homolog information) performs the best in all three prediction sets. An analysis of recall and precision finds that KScons MST-MP improves accuracy not only by improving identification of true positives, but also by decreasing the number of false positives. Over the CASP10 and CASP11 sets, KScons MST-MP performs better than the leading methods using only evolutionary coupling data, but not quite as well as the supervised learning methods of MetaPSICOV and CoinDCA-NN that incorporate a large set of structural features.

CONTACT

qiwei.li@rice.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Collapse

378

Zhang L, Wang H, Yan L, Su L, Xu D. OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method. J Comput Biol 2016;24:217-228. [PMID: 27513917 DOI: 10.1089/cmb.2015.0236] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

379

A novel algorithm for detecting multiple covariance and clustering of biological sequences. Sci Rep 2016;6:30425. [PMID: 27451921 PMCID: PMC4958985 DOI: 10.1038/srep30425] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 07/05/2016] [Indexed: 12/14/2022] Open

380

Baker JA, Simkovic F, Taylor HMC, Rigden DJ. Potential DNA binding and nuclease functions of ComEC domains characterized in silico. Proteins 2016;84:1431-42. [PMID: 27318187 PMCID: PMC5031224 DOI: 10.1002/prot.25088] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Revised: 05/25/2016] [Accepted: 06/13/2016] [Indexed: 12/15/2022]

381

Simkovic F, Thomas JMH, Keegan RM, Winn MD, Mayans O, Rigden DJ. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds. IUCRJ 2016;3:259-70. [PMID: 27437113 PMCID: PMC4937781 DOI: 10.1107/s2052252516008113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/18/2016] [Indexed: 05/05/2023]

382

Haldane A, Flynn WF, He P, Vijayan RSK, Levy RM. Structural propensities of kinase family proteins from a Potts model of residue co-variation. Protein Sci 2016;25:1378-84. [PMID: 27241634 DOI: 10.1002/pro.2954] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 05/25/2016] [Accepted: 05/26/2016] [Indexed: 12/23/2022]

383

Taylor WR. An algorithm to parse segment packing in predicted protein contact maps. Algorithms Mol Biol 2016;11:17. [PMID: 27330543 PMCID: PMC4912788 DOI: 10.1186/s13015-016-0080-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 05/24/2016] [Indexed: 11/30/2022] Open

384

Bhattacharya D, Cao R, Cheng J. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 2016;32:2791-9. [PMID: 27259540 PMCID: PMC5018369 DOI: 10.1093/bioinformatics/btw316] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/15/2016] [Indexed: 12/20/2022] Open

Abstract

MOTIVATION

Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling.

RESULTS

Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP.

AVAILABILITY AND IMPLEMENTATION

Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

385

Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins 2016;84 Suppl 1:4-14. [PMID: 27171127 DOI: 10.1002/prot.25064] [Citation(s) in RCA: 149] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 04/29/2016] [Accepted: 05/08/2016] [Indexed: 12/15/2022]

386

Pandini A, Kleinjung J, Taylor WR, Junge W, Khan S. The Phylogenetic Signature Underlying ATP Synthase c-Ring Compliance. Biophys J 2016;109:975-87. [PMID: 26331255 PMCID: PMC4564677 DOI: 10.1016/j.bpj.2015.07.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Revised: 06/20/2015] [Accepted: 07/09/2015] [Indexed: 12/28/2022] Open

Abstract

The proton-driven ATP synthase (FOF1) is comprised of two rotary, stepping motors (FO and F1) coupled by an elastic power transmission. The elastic compliance resides in the rotor module that includes the membrane-embedded FO c-ring. Proton transport by FO is firmly coupled to the rotation of the c-ring relative to other FO subunits (ab2). It drives ATP synthesis. We used a computational method to investigate the contribution of the c-ring to the total elastic compliance. We performed principal component analysis of conformational ensembles built using distance constraints from the bovine mitochondrial c-ring x-ray structure. Angular rotary twist, the dominant ring motion, was estimated to show that the c-ring accounted in part for the measured compliance. Ring rotation was entrained to rotation of the external helix within each hairpin-shaped c-subunit in the ring. Ensembles of monomer and dimers extracted from complete c-rings showed that the coupling between collective ring and the individual subunit motions was independent of the size of the c-ring, which varies between organisms. Molecular determinants were identified by covariance analysis of residue coevolution and structural-alphabet-based local dynamics correlations. The residue coevolution gave a readout of subunit architecture. The dynamic couplings revealed that the hinge for both ring and subunit helix rotations was constructed from the proton-binding site and the adjacent glycine motif (IB-GGGG) in the midmembrane plane. IB-GGGG motifs were linked by long-range couplings across the ring, while intrasubunit couplings connected the motif to the conserved cytoplasmic loop and adjacent segments. The correlation with principal collective motions shows that the couplings underlie both ring rotary and bending motions. Noncontact couplings between IB-GGGG motifs matched the coevolution signal as well as contact couplings. The residue coevolution reflects the physiological importance of the dynamics that may link proton transfer to ring compliance.

Collapse

387

Champeimont R, Laine E, Hu SW, Penin F, Carbone A. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins. Sci Rep 2016;6:26401. [PMID: 27198619 PMCID: PMC4873791 DOI: 10.1038/srep26401] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 05/03/2016] [Indexed: 12/20/2022] Open

388

Neuwald AF. Gleaning structural and functional information from correlations in protein multiple sequence alignments. Curr Opin Struct Biol 2016;38:1-8. [PMID: 27179293 DOI: 10.1016/j.sbi.2016.04.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/28/2016] [Accepted: 04/29/2016] [Indexed: 10/24/2022]

389

van Nimwegen E. Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean? PLoS Comput Biol 2016;12:e1004726. [PMID: 27171220 PMCID: PMC4865087 DOI: 10.1371/journal.pcbi.1004726] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

390

Wang S, Li W, Zhang R, Liu S, Xu J. CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res 2016;44:W361-6. [PMID: 27112569 PMCID: PMC4987891 DOI: 10.1093/nar/gkw307] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/12/2016] [Indexed: 12/14/2022] Open

391

Kurczynska M, Kania E, Konopka BM, Kotulska M. Applying PyRosetta molecular energies to separate properly oriented protein models from mirror models, obtained from contact maps. J Mol Model 2016;22:111. [PMID: 27107578 PMCID: PMC4842210 DOI: 10.1007/s00894-016-2975-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 04/05/2016] [Indexed: 11/30/2022]

392

Kinjo AR. A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions. Biophys Physicobiol 2016;13:45-62. [PMID: 27924257 PMCID: PMC5042171 DOI: 10.2142/biophysico.13.0_45] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 03/18/2016] [Indexed: 12/01/2022] Open

393

Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016;116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 179] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

394

Yang J, Jin QY, Zhang B, Shen HB. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter. ACTA ACUST UNITED AC 2016;32:2435-43. [PMID: 27153618 DOI: 10.1093/bioinformatics/btw181] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 04/03/2016] [Indexed: 11/12/2022]

Abstract

MOTIVATION

Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly.

RESULTS

In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction.

AVAILABILITY AND IMPLEMENTATION

http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

395

Asti L, Uguzzoni G, Marcatili P, Pagnani A. Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity. PLoS Comput Biol 2016;12:e1004870. [PMID: 27074145 PMCID: PMC4830580 DOI: 10.1371/journal.pcbi.1004870] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 03/15/2016] [Indexed: 11/18/2022] Open

396

Intramolecular allosteric communication in dopamine D2 receptor revealed by evolutionary amino acid covariation. Proc Natl Acad Sci U S A 2016;113:3539-44. [PMID: 26979958 DOI: 10.1073/pnas.1516579113] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

397

Bywater RP. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data. PLoS One 2016;11:e0150769. [PMID: 26963911 PMCID: PMC4786192 DOI: 10.1371/journal.pone.0150769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/18/2022] Open

398

Morrill GA, Kostellow AB, Liu L, Gupta RK, Askari A. Evolution of the α-Subunit of Na/K-ATPase from Paramecium to Homo sapiens: Invariance of Transmembrane Helix Topology. J Mol Evol 2016;82:183-98. [PMID: 26961431 PMCID: PMC4866997 DOI: 10.1007/s00239-016-9732-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 03/03/2016] [Indexed: 12/01/2022]

Abstract

Na/K-ATPase is a key plasma membrane enzyme involved in cell signaling, volume regulation, and maintenance of electrochemical gradients. The α-subunit, central to these functions, belongs to a large family of P-type ATPases. Differences in transmembrane (TM) helix topology, sequence homology, helix–helix contacts, cell signaling, and protein domains of Na/K-ATPase α-subunit were compared in fungi (Beauveria), unicellular organisms (Paramecia), primitive multicellular organisms (Hydra), and vertebrates (Xenopus, Homo sapiens), and correlated with evolution of physiological functions in the α-subunit. All α-subunits are of similar length, with groupings of four and six helices in the N- and C-terminal regions, respectively. Minimal homology was seen for protein domain patterns in Paramecium and Hydra, with high correlation between Hydra and vertebrates. Paramecium α-subunits display extensive disorder, with minimal helix contacts. Increases in helix contacts in Hydra approached vertebrates. Protein motifs known to be associated with membrane lipid rafts and cell signaling reveal significant positional shifts between Paramecium and Hydra vulgaris, indicating that regional membrane fluidity changes occur during evolution. Putative steroid binding sites overlapping TM-3 occurred in all species. Sites associated with G-protein-receptor stimulation occur both in vertebrates and amphibia but not in Hydra or Paramecia. The C-terminus moiety “KETYY,” necessary for the Na⁺ activation of pump phosphorylation, is not present in unicellular species indicating the absence of classical Na⁺/K⁺-pumps. The basic protein topology evolved earliest, followed by increases in protein domains and ordered helical arrays, correlated with appearance of α-subunit regions known to involve cell signaling, membrane recycling, and ion channel formation.

Collapse

399

Baker FN, Porollo A. CoeViz: a web-based tool for coevolution analysis of protein residues. BMC Bioinformatics 2016;17:119. [PMID: 26956673 PMCID: PMC4782369 DOI: 10.1186/s12859-016-0975-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 03/01/2016] [Indexed: 11/30/2022] Open

Abstract

Background

Proteins generally perform their function in a folded state. Residues forming an active site, whether it is a catalytic center or interaction interface, are frequently distant in a protein sequence. Hence, traditional sequence-based prediction methods focusing on a single residue (or a short window of residues) at a time may have difficulties in identifying and clustering the residues constituting a functional site, especially when a protein has multiple functions. Evolutionary information encoded in multiple sequence alignments is known to greatly improve sequence-based predictions. Identification of coevolving residues further advances the protein structure and function annotation by revealing cooperative pairs and higher order groupings of residues.

Results

We present a new web-based tool (CoeViz) that provides a versatile analysis and visualization of pairwise coevolution of amino acid residues. The tool computes three covariance metrics: mutual information, chi-square statistic, Pearson correlation, and one conservation metric: joint Shannon entropy. Implemented adjustments of covariance scores include phylogeny correction, corrections for sequence dissimilarity and alignment gaps, and the average product correction. Visualization of residue relationships is enhanced by hierarchical cluster trees, heat maps, circular diagrams, and the residue highlighting in protein sequence and 3D structure. Unlike other existing tools, CoeViz is not limited to analyzing conserved domains or protein families and can process long, unstructured and multi-domain proteins thousands of residues long. Two examples are provided to illustrate the use of the tool for identification of residues (1) involved in enzymatic function, (2) forming short linear functional motifs, and (3) constituting a structural domain.

Conclusions

CoeViz represents a practical resource for a quick sequence-based protein annotation for molecular biologists, e.g., for identifying putative functional clusters of residues and structural domains. CoeViz also can serve computational biologists as a resource of coevolution matrices, e.g., for developing machine learning-based prediction models. The presented tool is integrated in the POLYVIEW-2D server (http://polyview.cchmc.org/) and available from resulting pages of POLYVIEW-2D.

Collapse

400

Belsom A, Schneider M, Fischer L, Brock O, Rappsilber J. Serum Albumin Domain Structures in Human Blood Serum by Mass Spectrometry and Computational Biology. Mol Cell Proteomics 2016;15:1105-16. [PMID: 26385339 PMCID: PMC4813692 DOI: 10.1074/mcp.m115.048504] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 09/16/2015] [Indexed: 01/12/2023] Open