Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015;31:3506-13. [PMID: 26275894 DOI: 10.1093/bioinformatics/btv472] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open

For:	Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015;31:3506-13. [PMID: 26275894 DOI: 10.1093/bioinformatics/btv472] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open

Number

Cited by Other Article(s)

Vajdi A, Zarringhalam K, Haspel N. Patch-DCA: improved protein interface prediction by utilizing structural information and clustering DCA scores. Bioinformatics 2019;36:1460-1467. [DOI: 10.1093/bioinformatics/btz791] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 09/30/2019] [Accepted: 10/15/2019] [Indexed: 01/07/2023] Open

Jian Y, Wang X, Qiu J, Wang H, Liu Z, Zhao Y, Zeng C. DIRECT: RNA contact predictions by integrating structural patterns. BMC Bioinformatics 2019;20:497. [PMID: 31615418 PMCID: PMC6794908 DOI: 10.1186/s12859-019-3099-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 09/13/2019] [Indexed: 01/25/2023] Open

Distance-based protein folding powered by deep learning. Proc Natl Acad Sci U S A 2019;116:16856-16865. [PMID: 31399549 DOI: 10.1073/pnas.1821309116] [Citation(s) in RCA: 243] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Zeng H, Wang S, Zhou T, Zhao F, Li X, Wu Q, Xu J. ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res 2019;46:W432-W437. [PMID: 29790960 PMCID: PMC6030867 DOI: 10.1093/nar/gky420] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 05/20/2018] [Indexed: 12/15/2022] Open

Vogan AA, Ament-Velásquez SL, Granger-Farbos A, Svedberg J, Bastiaans E, Debets AJ, Coustou V, Yvanne H, Clavé C, Saupe SJ, Johannesson H. Combinations of Spok genes create multiple meiotic drivers in Podospora. eLife 2019;8:46454. [PMID: 31347500 PMCID: PMC6660238 DOI: 10.7554/elife.46454] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/09/2019] [Indexed: 11/13/2022] Open

Abstract

Meiotic drive is the preferential transmission of a particular allele during sexual reproduction. The phenomenon is observed as spore killing in multiple fungi. In natural populations of Podospora anserina, seven spore killer types (Psks) have been identified through classical genetic analyses. Here we show that the Spok gene family underlies the Psks. The combination of Spok genes at different chromosomal locations defines the spore killer types and creates a killing hierarchy within a population. We identify two novel Spok homologs located within a large (74–167 kbp) region (the Spok block) that resides in different chromosomal locations in different strains. We confirm that the SPOK protein performs both killing and resistance functions and show that these activities are dependent on distinct domains, a predicted nuclease and kinase domain. Genomic and phylogenetic analyses across ascomycetes suggest that the Spok genes disperse through cross-species transfer, and evolve by duplication and diversification within lineages.

In many organisms, most cells carry two versions of a given gene, one coming from the mother and the other from the father. An exception is sexual cells such as eggs, sperm, pollen or spores, which should only contain one variant of a gene. During their formation, these cells usually have an equal chance of inheriting one of the two gene versions.

However, a certain class of gene variants called meiotic drivers can cheat this process and end up in more than half of the sexual cells; often, the cells that contain the drivers can kill sibling cells that do not carry these variants. This results in the selfish genetic elements spreading through populations at a higher rate, sometimes with severe consequences such as shifting the ratio of males to females.

Meiotic drivers have been discovered in a wide range of organisms, from corn to mice to fruit flies and bread mold. They also exist in the fungus Podospora anserina, where they are called ‘spore killers’. Fungi are often used to study complex genetic processes, yet the identity and mode of action of spore killers in P. anserina were still unknown.

Vogan, Ament-Velásquez et al. used a combination of genetic methods to identify three genes from the Spok family which are responsible for certain spores being able to kill their siblings. Two of these were previously unknown, and they could be found in different locations throughout the genome as part of a larger genetic region. Depending on the combination of Spok genes it carries, a spore can kill or be protected against other spores that contain different permutations of the genes. Copies of these genes were also shown to be present in other fungi, including species that are a threat to crops.

Scientists have already started to create synthetic meiotic drivers to manipulate how certain traits are inherited within a population. This could be useful to control or eradicate pests and insects that transmit dangerous diseases. The results by Vogan, Ament-Velásquez et al. shine a light on the complex ways that natural meiotic drivers work, including how they can be shared between species; this knowledge could inform how to safely deploy synthetic drivers in the wild.

Collapse

Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019;9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open

Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019;19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open

Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018;14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open

Abstract

The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.

Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.

Collapse

Ding W, Mao W, Shao D, Zhang W, Gong H. DeepConPred2: An Improved Method for the Prediction of Protein Residue Contacts. Comput Struct Biotechnol J 2018;16:503-510. [PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/16/2018] [Accepted: 10/18/2018] [Indexed: 12/18/2022] Open

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018;15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 907] [Impact Index Per Article: 129.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open

Affiliation(s)

Travers Ching Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
Daniel S Himmelstein Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Brett K Beaulieu-Jones Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Alexandr A Kalinin Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
Brian T Do Harvard Medical School, Boston, MA, USA
Gregory P Way Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Enrico Ferrero Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
Paul-Michael Agapow Data Science Institute, Imperial College London, London, UK
Michael Zietz Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Michael M Hoffman Princess Margaret Cancer Centre, Toronto, Ontario, Canada Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Wei Xie Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Gail L Rosen Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Benjamin J Lengerich Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Johnny Israeli Biophysics Program, Stanford University, Stanford, CA, USA
Jack Lanchantin Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Stephen Woloszynek Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Anne E Carpenter Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Avanti Shrikumar Department of Computer Science, Stanford University, Stanford, CA, USA
Jinbo Xu Toyota Technological Institute at Chicago, Chicago, IL, USA
Evan M Cofer Department of Computer Science, Trinity University, San Antonio, TX, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
Christopher A Lavender Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
Srinivas C Turaga Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
Amr M Alexandari Department of Computer Science, Stanford University, Stanford, CA, USA
Zhiyong Lu National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
David J Harris Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
Dave DeCaprio ClosedLoop.ai, Austin, TX, USA
Yanjun Qi Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Anshul Kundaje Department of Computer Science, Stanford University, Stanford, CA, USA Department of Genetics, Stanford University, Stanford, CA, USA
Yifan Peng National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Laura K Wiley Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
Marwin H S Segler Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
Simina M Boca Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
S Joshua Swamidass Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
Austin Huang Department of Medicine, Brown University, Providence, RI, USA
Anthony Gitter Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Morgridge Institute for Research, Madison, WI, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Collapse

Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018;1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]

Wang S, Li Z, Yu Y, Xu J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst 2017;5:202-211.e3. [PMID: 28957654 PMCID: PMC5637520 DOI: 10.1016/j.cels.2017.09.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/01/2017] [Accepted: 08/29/2017] [Indexed: 01/02/2023]

Sabzekar M, Naghibzadeh M, Eghdami M, Aydin Z. Protein β-sheet prediction using an efficient dynamic programming algorithm. Comput Biol Chem 2017;70:142-155. [PMID: 28881217 DOI: 10.1016/j.compbiolchem.2017.08.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 07/25/2017] [Accepted: 08/18/2017] [Indexed: 11/28/2022]

Abstract

Predicting the β-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in β-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all β-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of β-strands. Additionally, brute-force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate β-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict β-sheet structures with high number of β-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art β-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar.

Collapse

Wang S, Sun S, Xu J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 2017;86 Suppl 1:67-77. [PMID: 28845538 DOI: 10.1002/prot.25377] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 08/18/2017] [Accepted: 08/25/2017] [Indexed: 11/08/2022]

Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017;18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open

Abstract

Background

In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair.

Results

First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics.

Conclusions

The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.

Electronic supplementary material

The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.

Collapse

Wang S, Ma J, Xu J. AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2017;32:i672-i679. [PMID: 27587688 DOI: 10.1093/bioinformatics/btw446] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

MOTIVATION

Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile.

METHOD

This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data.

RESULTS

Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others.

AVAILABILITY AND IMPLEMENTATION

http://raptorx2.uchicago.edu/StructurePropertyPred/predict/

CONTACT

wangsheng@uchicago.edu, jinboxu@gmail.com

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017;33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open

Simkovic F, Ovchinnikov S, Baker D, Rigden DJ. Applications of contact predictions to structural biology. IUCRJ 2017;4:291-300. [PMID: 28512576 PMCID: PMC5414403 DOI: 10.1107/s2052252517005115] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/03/2017] [Indexed: 06/07/2023]

Abstract

Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallo-graphy, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations.

Collapse

Chapman SD, Adami C, Wilke CO, B Kc D. The evolution of logic circuits for the purpose of protein contact map prediction. PeerJ 2017;5:e3139. [PMID: 28439455 PMCID: PMC5398280 DOI: 10.7717/peerj.3139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 03/02/2017] [Indexed: 11/20/2022] Open

Huang W, Zeng X, Shi Y, Liu M. Functional characterization of human equilibrative nucleoside transporter 1. Protein Cell 2017;8:284-295. [PMID: 27995448 PMCID: PMC5359181 DOI: 10.1007/s13238-016-0350-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 11/04/2016] [Indexed: 12/15/2022] Open

Castelli M, Clementi N, Pfaff J, Sautto GA, Diotti RA, Burioni R, Doranz BJ, Dal Peraro M, Clementi M, Mancini N. A Biologically-validated HCV E1E2 Heterodimer Structural Model. Sci Rep 2017;7:214. [PMID: 28303031 PMCID: PMC5428263 DOI: 10.1038/s41598-017-00320-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 02/21/2017] [Indexed: 12/14/2022] Open

Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 2017;13:e1005324. [PMID: 28056090 PMCID: PMC5249242 DOI: 10.1371/journal.pcbi.1005324] [Citation(s) in RCA: 589] [Impact Index Per Article: 73.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/20/2017] [Accepted: 12/20/2016] [Indexed: 12/02/2022] Open

Abstract

Motivation

Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction.

Method

This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question.

Results

Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then.

Availability

http://raptorx.uchicago.edu/ContactMap/

Protein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurrence patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained mostly with soluble proteins, our method performs very well on membrane proteins. Recent blind CAMEO test confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.

Collapse

Rawi R, Mall R, Kunji K, El Anbari M, Aupetit M, Ullah E, Bensmail H. COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator. BMC Bioinformatics 2016;17:533. [PMID: 27978812 PMCID: PMC5159955 DOI: 10.1186/s12859-016-1400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/01/2016] [Indexed: 11/13/2022] Open

Abstract

Background

The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso.

Results

Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions.

Conclusion

We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users.

Collapse

Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016;43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]

Du T, Liao L, Wu CH. Enhancing interacting residue prediction with integrated contact matrix prediction in protein-protein interaction. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016;2016:17. [PMID: 27818677 PMCID: PMC5075339 DOI: 10.1186/s13637-016-0051-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Accepted: 09/25/2016] [Indexed: 11/10/2022]

Li Q, Dahl DB, Vannucci M, Joo H, Tsai JW. KScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure. Bioinformatics 2016;32:3774-3781. [PMID: 27559156 DOI: 10.1093/bioinformatics/btw553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 07/15/2016] [Accepted: 08/18/2016] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a socket and knob, we undertake an investigation of the knob-socket construct's ability to improve the prediction of residue contacts. The statistical model considers three priors and two posterior estimations to better understand how the input data affects predictions. This produces six implementations of KScons that are tested on three sets: PSICOV, CASP10 and CASP11. We compare against the current leading contact prediction methods.

RESULTS

The results demonstrate the usefulness as well as the limits of knob-socket based structural modeling of protein contacts. The construct is able to extract good predictions from known structural homologs, while its performance degrades when no homologs exist. Among our six implementations, KScons MST-MP (which uses the multiple structure alignment prior and marginal posterior incorporating structural homolog information) performs the best in all three prediction sets. An analysis of recall and precision finds that KScons MST-MP improves accuracy not only by improving identification of true positives, but also by decreasing the number of false positives. Over the CASP10 and CASP11 sets, KScons MST-MP performs better than the leading methods using only evolutionary coupling data, but not quite as well as the supervised learning methods of MetaPSICOV and CoinDCA-NN that incorporate a large set of structural features.

CONTACT

qiwei.li@rice.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Collapse

Simkovic F, Thomas JMH, Keegan RM, Winn MD, Mayans O, Rigden DJ. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds. IUCRJ 2016;3:259-70. [PMID: 27437113 PMCID: PMC4937781 DOI: 10.1107/s2052252516008113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/18/2016] [Indexed: 05/05/2023]

Wang S, Li W, Zhang R, Liu S, Xu J. CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res 2016;44:W361-6. [PMID: 27112569 PMCID: PMC4987891 DOI: 10.1093/nar/gkw307] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/12/2016] [Indexed: 12/14/2022] Open

Schneider M, Belsom A, Rappsilber J, Brock O. Blind testing of cross-linking/mass spectrometry hybrid methods in CASP11. Proteins 2016;84 Suppl 1:152-63. [PMID: 26945814 PMCID: PMC5042049 DOI: 10.1002/prot.25028] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 02/09/2016] [Accepted: 02/27/2016] [Indexed: 12/26/2022]

Abstract

Hybrid approaches combine computational methods with experimental data. The information contained in the experimental data can be leveraged to probe the structure of proteins otherwise elusive to computational methods. Compared with computational methods, the structures produced by hybrid methods exhibit some degree of experimental validation. In spite of these advantages, most hybrid methods have not yet been validated in blind tests, hampering their development. Here, we describe the first blind test of a specific cross-link based hybrid method in CASP. This blind test was coordinated by the CASP organizers and utilized a novel, high-density cross-linking/mass-spectrometry (CLMS) approach that is able to collect high-density CLMS data in a matter of days. This experimental protocol was developed in the Rappsilber laboratory. This approach exploits the chemistry of a highly reactive, photoactivatable cross-linker to produce an order of magnitude more cross-links than homobifunctional cross-linkers. The Rappsilber laboratory generated experimental CLMS data based on this protocol, submitted the data to the CASP organizers which then released this data to the CASP11 prediction groups in a separate, CLMS assisted modeling experiment. We did not observe a clear improvement of assisted models, presumably because the properties of the CLMS data-uncertainty in cross-link identification and residue-residue assignment, and uneven distribution over the protein-were largely unknown to the prediction groups and their approaches were not yet tailored to this kind of data. We also suggest modifications to the CLMS-CASP experiment and discuss the importance of rigorous blind testing in the development of hybrid methods. Proteins 2016; 84(Suppl 1):152-163. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

Collapse