Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shindyalov IN, Kolchanov NA, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 1994;7:349-58. [PMID: 8177884 DOI: 10.1093/protein/7.3.349] [Citation(s) in RCA: 184] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

For:	Shindyalov IN, Kolchanov NA, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 1994;7:349-58. [PMID: 8177884 DOI: 10.1093/protein/7.3.349] [Citation(s) in RCA: 184] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Number

Cited by Other Article(s)

Kalogeropoulos K, Bohn MF, Jenkins DE, Ledergerber J, Sørensen CV, Hofmann N, Wade J, Fryer T, Thi Tuyet Nguyen G, Auf dem Keller U, Laustsen AH, Jenkins TP. A comparative study of protein structure prediction tools for challenging targets: Snake venom toxins. Toxicon 2024;238:107559. [PMID: 38113945 DOI: 10.1016/j.toxicon.2023.107559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/06/2023] [Accepted: 12/08/2023] [Indexed: 12/21/2023]

Rix G, Williams RL, Spinner H, Hu VJ, Marks DS, Liu CC. Continuous evolution of user-defined genes at 1-million-times the genomic mutation rate. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.13.566922. [PMID: 38014077 PMCID: PMC10680746 DOI: 10.1101/2023.11.13.566922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Cheng Y, Wang H, Xu H, Liu Y, Ma B, Chen X, Zeng X, Wang X, Wang B, Shiau C, Ovchinnikov S, Su XD, Wang C. Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat Chem Biol 2023;19:548-555. [PMID: 36593274 DOI: 10.1038/s41589-022-01223-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 11/08/2022] [Indexed: 01/03/2023]

Affiliation(s)

Yao Cheng Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Haobo Wang Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Hua Xu State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
Yuan Liu Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China. Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China.
Bin Ma Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Xuemin Chen Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Xin Zeng Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
Xianghe Wang Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Bo Wang State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
Carina Shiau Cornell University, Ithaca, NY, USA
Sergey Ovchinnikov John Harvard Distinguished Science Fellow, Harvard University, Cambridge, MA, USA
Xiao-Dong Su State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
Chu Wang Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China. Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China. Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China.

Collapse

Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023;63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]

Niitsu A, Sugita Y. Towards de novo design of transmembrane α-helical assemblies using structural modelling and molecular dynamics simulation. Phys Chem Chem Phys 2023;25:3595-3606. [PMID: 36647771 DOI: 10.1039/d2cp03972a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022;50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]

Abstract

The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.

Collapse

Liu H, Chen Q. Computational protein design with data‐driven approaches: Recent developments and perspectives. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Ahmed S, Chattopadhyay G, Manjunath K, Bhasin M, Singh N, Rasool M, Das S, Rana V, Khan N, Mitra D, Asok A, Singh R, Varadarajan R. Combining cysteine scanning with chemical labeling to map protein-protein interactions and infer bound structure in an intrinsically disordered region. Front Mol Biosci 2022;9:997653. [PMID: 36275627 PMCID: PMC9585320 DOI: 10.3389/fmolb.2022.997653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/12/2022] [Indexed: 11/13/2022] Open

Chattopadhyay G, Bhowmick J, Manjunath K, Ahmed S, Goyal P, Varadarajan R. Mechanistic insights into global suppressors of protein folding defects. PLoS Genet 2022;18:e1010334. [PMID: 36037221 PMCID: PMC9491731 DOI: 10.1371/journal.pgen.1010334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 09/09/2022] [Accepted: 07/11/2022] [Indexed: 01/14/2023] Open

Ward KM, Pickett BD, Ebbert MTW, Kauwe JSK, Miller JB. Web-Based Protein Interactions Calculator Identifies Likely Proteome Coevolution with Alzheimer’s Disease-Associated Proteins. Genes (Basel) 2022;13:genes13081346. [PMID: 36011253 PMCID: PMC9407263 DOI: 10.3390/genes13081346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 07/22/2022] [Accepted: 07/23/2022] [Indexed: 11/19/2022] Open

Braberg H, Echeverria I, Kaake RM, Sali A, Krogan NJ. From systems to structure - using genetic data to model protein structures. Nat Rev Genet 2022;23:342-354. [PMID: 35013567 PMCID: PMC8744059 DOI: 10.1038/s41576-021-00441-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/25/2021] [Indexed: 12/11/2022]

Gröhs Ferrareze PA, Zimerman RA, Franceschi VB, Caldana GD, Netz PA, Thompson CE. Molecular evolution and structural analyses of the spike glycoprotein from Brazilian SARS-CoV-2 genomes: the impact of selected mutations. J Biomol Struct Dyn 2022;41:3110-3128. [PMID: 35594172 DOI: 10.1080/07391102.2022.2076154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Miyazawa S. Boltzmann Machine Learning and Regularization Methods for Inferring Evolutionary Fields and Couplings From a Multiple Sequence Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:328-342. [PMID: 32396099 DOI: 10.1109/tcbb.2020.2993232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Behdenna A, Godfroid M, Petot P, Pothier J, Lambert A, Achaz G. A minimal yet flexible likelihood framework to assess correlated evolution. Syst Biol 2021;71:823-838. [PMID: 34792608 DOI: 10.1093/sysbio/syab092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 11/04/2021] [Accepted: 11/09/2021] [Indexed: 11/14/2022] Open

Affiliation(s)

Abdelkader Behdenna Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France Epigene Labs, 7 Square Gabriel Fauré, 75017 Paris, France
Maxime Godfroid SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
Patrice Petot Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
Joël Pothier Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
Amaury Lambert SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France Laboratoire de Probabilités, Statistique et Modélisation (LPSM), Sorbonne Université, CNRS UMR 8001, Université de Paris, 4, place Jussieu, 75005 Paris, France
Guillaume Achaz Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France Éco-anthropologie, Muséum National d'Histoire Naturelle, CNRS UMR 7206, Université de Paris, place du Trocadéro, 75016 Paris, France

Collapse

Basu S, Bahadur RP. Conservation and coevolution determine evolvability of different classes of disordered residues in human intrinsically disordered proteins. Proteins 2021;90:632-644. [PMID: 34626492 DOI: 10.1002/prot.26261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 12/19/2022]

Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021;433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]

Abstract

Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.

Collapse

Ruff KM, Pappu RV. AlphaFold and Implications for Intrinsically Disordered Proteins. J Mol Biol 2021;433:167208. [PMID: 34418423 DOI: 10.1016/j.jmb.2021.167208] [Citation(s) in RCA: 199] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 08/11/2021] [Accepted: 08/12/2021] [Indexed: 10/20/2022]

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583-589. [PMID: 34265844 PMCID: PMC8371605 DOI: 10.1038/s41586-021-03819-2] [Citation(s) in RCA: 13645] [Impact Index Per Article: 4548.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/12/2021] [Indexed: 02/07/2023]

Dokholyan NV. Experimentally-driven protein structure modeling. J Proteomics 2020;220:103777. [PMID: 32268219 PMCID: PMC7214187 DOI: 10.1016/j.jprot.2020.103777] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/17/2020] [Accepted: 04/02/2020] [Indexed: 11/25/2022]

Steiner MC, Gibson KM, Crandall KA. Drug Resistance Prediction Using Deep Learning Techniques on HIV-1 Sequence Data. Viruses 2020;12:E560. [PMID: 32438586 PMCID: PMC7290575 DOI: 10.3390/v12050560] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 05/08/2020] [Accepted: 05/17/2020] [Indexed: 12/20/2022] Open

Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Li Y, Zhang C, Bell EW, Yu DJ, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins 2019;87:1082-1091. [PMID: 31407406 PMCID: PMC6851483 DOI: 10.1002/prot.25798] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 07/20/2019] [Accepted: 08/08/2019] [Indexed: 12/26/2022]

Wang X, Jing X, Deng Y, Nie Y, Xu F, Xu Y, Zhao YL, Hunt JF, Montelione GT, Szyperski T. Evolutionary coupling saturation mutagenesis: Coevolution-guided identification of distant sites influencing Bacillus naganoensis pullulanase activity. FEBS Lett 2019;594:799-812. [PMID: 31665817 DOI: 10.1002/1873-3468.13652] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/15/2019] [Accepted: 10/25/2019] [Indexed: 01/20/2023]

Zhang H, Zhang Q, Ju F, Zhu J, Gao Y, Xie Z, Deng M, Sun S, Zheng WM, Bu D. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. BMC Bioinformatics 2019;20:537. [PMID: 31664895 PMCID: PMC6821021 DOI: 10.1186/s12859-019-3051-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 08/22/2019] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge.

RESULTS

In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset.

CONCLUSIONS

Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts.

Collapse

Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins 2019;87:1058-1068. [PMID: 31587357 DOI: 10.1002/prot.25819] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 01/07/2023]

Wozniak PP, Pelc J, Skrzypecki M, Vriend G, Kotulska M. Bio-knowledge-based filters improve residue-residue contact prediction accuracy. Bioinformatics 2019;34:3675-3683. [PMID: 29850768 DOI: 10.1093/bioinformatics/bty416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 05/19/2018] [Indexed: 11/13/2022] Open

Hockenberry AJ, Wilke CO. Evolutionary couplings detect side-chain interactions. PeerJ 2019;7:e7280. [PMID: 31328041 PMCID: PMC6622159 DOI: 10.7717/peerj.7280] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 06/09/2019] [Indexed: 12/19/2022] Open

Haldane A, Flynn WF, He P, Levy RM. Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs. Biophys J 2019;114:21-31. [PMID: 29320688 DOI: 10.1016/j.bpj.2017.10.028] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 09/11/2017] [Accepted: 10/17/2017] [Indexed: 01/25/2023] Open

Detecting Amino Acid Coevolution with Bayesian Graphical Models. Methods Mol Biol 2019;1851:105-122. [PMID: 30298394 DOI: 10.1007/978-1-4939-8736-8_6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments. Molecules 2018;24:molecules24010104. [PMID: 30597916 PMCID: PMC6337344 DOI: 10.3390/molecules24010104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 12/21/2018] [Accepted: 12/24/2018] [Indexed: 11/17/2022] Open

Huang YJ, Brock KP, Ishida Y, Swapna GVT, Inouye M, Marks DS, Sander C, Montelione GT. Combining Evolutionary Covariance and NMR Data for Protein Structure Determination. Methods Enzymol 2018;614:363-392. [PMID: 30611430 PMCID: PMC6640129 DOI: 10.1016/bs.mie.2018.11.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Affiliation(s)

Yuanpeng Janet Huang Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Kelly P Brock Department of Systems Biology, Harvard Medical School, Boston, MA, United States
Yojiro Ishida Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Gurla V T Swapna Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Masayori Inouye Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Debora S Marks Department of Systems Biology, Harvard Medical School, Boston, MA, United States
Chris Sander Department of Cell Biology, Harvard Medical School and cBio Center, Dana-Farber Cancer Institute, Boston, MA, United States
Gaetano T Montelione Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States.

Collapse

Vorberg S, Seemayer S, Söding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput Biol 2018;14:e1006526. [PMID: 30395601 PMCID: PMC6237422 DOI: 10.1371/journal.pcbi.1006526] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/15/2018] [Accepted: 09/24/2018] [Indexed: 12/01/2022] Open

Abstract

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Knowledge about the three-dimensional structure of proteins is key to understanding their function and role in biological processes and diseases. The experimental structure determination techniques, such as X-ray crystallography or electron cryo-microscopy, are labour intensive, time-consuming and expensive. Therefore, complementary computational methods to predict a protein’s structure have become indispensable. Over the last years, immense progress has been made in predicting protein structures from their amino acid sequence by utilizing highly accurate predictions of spatial contacts between amino acid residues as constraints in folding simulations. However, contact prediction methods require large numbers of homologous protein sequences in order to discriminate between signal and noise. A major obstacle preventing progress on the statistical methodology is our limited understanding of the different components of noise that are known to affect the predictions. We provide two tools, CCMpredPy and CCMgen, that can be used to learn highly accurate statistical models for contact prediction and to simulate protein evolution according to the statistical constraints between positions of residues as specified by these models, respectively. We showcase their usefulness by quantifying the relative contribution of noise arising from entropy and phylogeny on the predicted contacts, which will facilitate the improvement of the statistical methodology.

Collapse

Endutkin AV, Koptelov SS, Popov AV, Torgasheva NA, Lomzov AA, Tsygankova AR, Skiba TV, Afonnikov DA, Zharkov DO. Residue coevolution reveals functionally important intramolecular interactions in formamidopyrimidine-DNA glycosylase. DNA Repair (Amst) 2018;69:24-33. [DOI: 10.1016/j.dnarep.2018.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 07/04/2018] [Accepted: 07/04/2018] [Indexed: 10/28/2022]

Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018;7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open

Baldi P. Deep Learning in Biomedical Data Science. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013343] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018;13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open

He B, Mortuza SM, Wang Y, Shen HB, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2018;33:2296-2306. [PMID: 28369334 DOI: 10.1093/bioinformatics/btx164] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/21/2017] [Indexed: 12/12/2022] Open

Abstract

Motivation

Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction.

Results

We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles.

Availiablity and Implementation

On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ .

Contact

zhng@umich.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018;86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 125] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]

dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018;34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open

Power law tails in phylogenetic systems. Proc Natl Acad Sci U S A 2018;115:690-695. [PMID: 29311320 DOI: 10.1073/pnas.1711913115] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Prediction of Structures and Interactions from Genome Information. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018;1105:123-152. [DOI: 10.1007/978-981-13-2200-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Huang YJ, Brock KP, Sander C, Marks DS, Montelione GT. A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018;1105:153-169. [PMID: 30617828 DOI: 10.1007/978-981-13-2200-6_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Computational studies of membrane proteins: from sequence to structure to simulation. Curr Opin Struct Biol 2017;45:133-141. [DOI: 10.1016/j.sbi.2017.04.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 04/07/2017] [Accepted: 04/07/2017] [Indexed: 11/19/2022]

de Oliveira S, Deane C. Co-evolution techniques are reshaping the way we do structural bioinformatics. F1000Res 2017;6:1224. [PMID: 28781768 PMCID: PMC5531156 DOI: 10.12688/f1000research.11543.1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/24/2017] [Indexed: 11/20/2022] Open

Nshogozabahizi JC, Dench J, Aris-Brosou S. Widespread Historical Contingency in Influenza Viruses. Genetics 2017;205:409-420. [PMID: 28049709 PMCID: PMC5223518 DOI: 10.1534/genetics.116.193979] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 11/04/2016] [Indexed: 11/18/2022] Open

Cheng RR, Nordesjö O, Hayes RL, Levine H, Flores SC, Onuchic JN, Morcos F. Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes. Mol Biol Evol 2016;33:3054-3064. [PMID: 27604223 PMCID: PMC5100047 DOI: 10.1093/molbev/msw188] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 2016;84 Suppl 1:131-44. [PMID: 26474083 PMCID: PMC4834069 DOI: 10.1002/prot.24943] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/15/2015] [Accepted: 10/11/2015] [Indexed: 12/27/2022]

Li Q, Dahl DB, Vannucci M, Joo H, Tsai JW. KScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure. Bioinformatics 2016;32:3774-3781. [PMID: 27559156 DOI: 10.1093/bioinformatics/btw553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 07/15/2016] [Accepted: 08/18/2016] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a socket and knob, we undertake an investigation of the knob-socket construct's ability to improve the prediction of residue contacts. The statistical model considers three priors and two posterior estimations to better understand how the input data affects predictions. This produces six implementations of KScons that are tested on three sets: PSICOV, CASP10 and CASP11. We compare against the current leading contact prediction methods.

RESULTS

The results demonstrate the usefulness as well as the limits of knob-socket based structural modeling of protein contacts. The construct is able to extract good predictions from known structural homologs, while its performance degrades when no homologs exist. Among our six implementations, KScons MST-MP (which uses the multiple structure alignment prior and marginal posterior incorporating structural homolog information) performs the best in all three prediction sets. An analysis of recall and precision finds that KScons MST-MP improves accuracy not only by improving identification of true positives, but also by decreasing the number of false positives. Over the CASP10 and CASP11 sets, KScons MST-MP performs better than the leading methods using only evolutionary coupling data, but not quite as well as the supervised learning methods of MetaPSICOV and CoinDCA-NN that incorporate a large set of structural features.

CONTACT

qiwei.li@rice.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Collapse

Franceus J, Verhaeghe T, Desmet T. Correlated positions in protein evolution and engineering. J Ind Microbiol Biotechnol 2016;44:687-695. [PMID: 27514664 DOI: 10.1007/s10295-016-1811-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 07/30/2016] [Indexed: 12/22/2022]

Joy JB, Liang RH, McCloskey RM, Nguyen T, Poon AFY. Ancestral Reconstruction. PLoS Comput Biol 2016;12:e1004763. [PMID: 27404731 PMCID: PMC4942178 DOI: 10.1371/journal.pcbi.1004763] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open