51
|
Vajdi A, Zarringhalam K, Haspel N. Patch-DCA: improved protein interface prediction by utilizing structural information and clustering DCA scores. Bioinformatics 2019; 36:1460-1467. [DOI: 10.1093/bioinformatics/btz791] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 09/30/2019] [Accepted: 10/15/2019] [Indexed: 01/07/2023] Open
Abstract
Abstract
Motivation
Over the past decade, there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein–protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein–protein inter-residue contacts remains relatively limited.
Results
In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein–protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70% and 24% and the average relative precision by 80% and 36% in comparison with two state-of-the-art methods, PSICOV and GREMLIN.
Availability and implementation
https://github.com/BioMLBoston/PatchDCA
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amir Vajdi
- Computer Science Department, University of Massachusetts Boston, Boston, MA, USA
- Department of Informatics and Analytics, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Nurit Haspel
- Computer Science Department, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
52
|
Jian Y, Wang X, Qiu J, Wang H, Liu Z, Zhao Y, Zeng C. DIRECT: RNA contact predictions by integrating structural patterns. BMC Bioinformatics 2019; 20:497. [PMID: 31615418 PMCID: PMC6794908 DOI: 10.1186/s12859-019-3099-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 09/13/2019] [Indexed: 01/25/2023] Open
Abstract
Background It is widely believed that tertiary nucleotide-nucleotide interactions are essential in determining RNA structure and function. Currently, direct coupling analysis (DCA) infers nucleotide contacts in a sequence from its homologous sequence alignment across different species. DCA and similar approaches that use sequence information alone typically yield a low accuracy, especially when the available homologous sequences are limited. Therefore, new methods for RNA structural contact inference are desirable because even a single correctly predicted tertiary contact can potentially make the difference between a correct and incorrectly predicted structure. Here we present a new method DIRECT (Direct Information REweighted by Contact Templates) that incorporates a Restricted Boltzmann Machine (RBM) to augment the information on sequence co-variations with structural features in contact inference. Results Benchmark tests demonstrate that DIRECT achieves better overall performance than DCA approaches. Compared to mfDCA and plmDCA, DIRECT produces a substantial increase of 41 and 18%, respectively, in accuracy on average for contact prediction. DIRECT improves predictions for long-range contacts and captures more tertiary structural features. Conclusions We developed a hybrid approach that incorporates a Restricted Boltzmann Machine (RBM) to augment the information on sequence co-variations with structural templates in contact inference. Our results demonstrate that DIRECT is able to improve the RNA contact prediction.
Collapse
Affiliation(s)
- Yiren Jian
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China.,Department of Physics, The George Washington University, Washington DC, 20052, USA
| | - Xiaonan Wang
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Jaidi Qiu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Huiwen Wang
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Zhichao Liu
- Department of Physics, The George Washington University, Washington DC, 20052, USA
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China.
| | - Chen Zeng
- Department of Physics, The George Washington University, Washington DC, 20052, USA.
| |
Collapse
|
53
|
Abstract
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
Collapse
|
54
|
Zeng H, Wang S, Zhou T, Zhao F, Li X, Wu Q, Xu J. ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res 2019; 46:W432-W437. [PMID: 29790960 PMCID: PMC6030867 DOI: 10.1093/nar/gky420] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 05/20/2018] [Indexed: 12/15/2022] Open
Abstract
ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
Collapse
Affiliation(s)
- Hong Zeng
- School of Computer Science and Technology, Hangzhou Dianzi University, China
| | - Sheng Wang
- King Abdullah University of Science and Technology (KAUST), Saudi Arabia.,Toyota Technological Institute at Chicago, USA
| | - Tianming Zhou
- Toyota Technological Institute at Chicago, USA.,Institute for Interdisciplinary Information Sciences, Tsinghua University, China
| | - Feifeng Zhao
- School of Computer Science and Technology, Hangzhou Dianzi University, China
| | - Xiufeng Li
- School of Computer Science and Technology, Hangzhou Dianzi University, China
| | - Qing Wu
- School of Computer Science and Technology, Hangzhou Dianzi University, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| |
Collapse
|
55
|
Vogan AA, Ament-Velásquez SL, Granger-Farbos A, Svedberg J, Bastiaans E, Debets AJ, Coustou V, Yvanne H, Clavé C, Saupe SJ, Johannesson H. Combinations of Spok genes create multiple meiotic drivers in Podospora. eLife 2019; 8:46454. [PMID: 31347500 PMCID: PMC6660238 DOI: 10.7554/elife.46454] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/09/2019] [Indexed: 11/13/2022] Open
Abstract
Meiotic drive is the preferential transmission of a particular allele during sexual reproduction. The phenomenon is observed as spore killing in multiple fungi. In natural populations of Podospora anserina, seven spore killer types (Psks) have been identified through classical genetic analyses. Here we show that the Spok gene family underlies the Psks. The combination of Spok genes at different chromosomal locations defines the spore killer types and creates a killing hierarchy within a population. We identify two novel Spok homologs located within a large (74–167 kbp) region (the Spok block) that resides in different chromosomal locations in different strains. We confirm that the SPOK protein performs both killing and resistance functions and show that these activities are dependent on distinct domains, a predicted nuclease and kinase domain. Genomic and phylogenetic analyses across ascomycetes suggest that the Spok genes disperse through cross-species transfer, and evolve by duplication and diversification within lineages. In many organisms, most cells carry two versions of a given gene, one coming from the mother and the other from the father. An exception is sexual cells such as eggs, sperm, pollen or spores, which should only contain one variant of a gene. During their formation, these cells usually have an equal chance of inheriting one of the two gene versions. However, a certain class of gene variants called meiotic drivers can cheat this process and end up in more than half of the sexual cells; often, the cells that contain the drivers can kill sibling cells that do not carry these variants. This results in the selfish genetic elements spreading through populations at a higher rate, sometimes with severe consequences such as shifting the ratio of males to females. Meiotic drivers have been discovered in a wide range of organisms, from corn to mice to fruit flies and bread mold. They also exist in the fungus Podospora anserina, where they are called ‘spore killers’. Fungi are often used to study complex genetic processes, yet the identity and mode of action of spore killers in P. anserina were still unknown. Vogan, Ament-Velásquez et al. used a combination of genetic methods to identify three genes from the Spok family which are responsible for certain spores being able to kill their siblings. Two of these were previously unknown, and they could be found in different locations throughout the genome as part of a larger genetic region. Depending on the combination of Spok genes it carries, a spore can kill or be protected against other spores that contain different permutations of the genes. Copies of these genes were also shown to be present in other fungi, including species that are a threat to crops. Scientists have already started to create synthetic meiotic drivers to manipulate how certain traits are inherited within a population. This could be useful to control or eradicate pests and insects that transmit dangerous diseases. The results by Vogan, Ament-Velásquez et al. shine a light on the complex ways that natural meiotic drivers work, including how they can be shared between species; this knowledge could inform how to safely deploy synthetic drivers in the wild.
Collapse
Affiliation(s)
- Aaron A Vogan
- Organismal biology, Uppsala University, Uppsala, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
56
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
57
|
DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019; 9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open
Abstract
The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.
Collapse
|
58
|
Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019; 19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open
Abstract
Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.
Collapse
Affiliation(s)
- Qiqige Wuyun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wei Zheng
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
59
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
60
|
Ding W, Mao W, Shao D, Zhang W, Gong H. DeepConPred2: An Improved Method for the Prediction of Protein Residue Contacts. Comput Struct Biotechnol J 2018; 16:503-510. [PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/16/2018] [Accepted: 10/18/2018] [Indexed: 12/18/2022] Open
Abstract
Information of residue-residue contacts is essential for understanding the mechanism of protein folding, and has been successfully applied as special topological restraints to simplify the conformational sampling in de novo protein structure prediction. Prediction of protein residue contacts has experienced amazingly rapid progresses recently, with prediction accuracy approaching impressively high levels in the past two years. In this work, we introduce a second version of our residue contact predictor, DeepConPred2, which exhibits substantially improved performance and sufficiently reduced running time after model re-optimization and feature updates. When testing on the CASP12 free modeling targets, our program reaches at least the same level of prediction accuracy as the best contact predictors so far and provides information complementary to other state-of-the-art methods in contact-assisted folding.
Collapse
Affiliation(s)
- Wenze Ding
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Di Shao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenxuan Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
61
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 907] [Impact Index Per Article: 129.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
62
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
63
|
Wang S, Li Z, Yu Y, Xu J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst 2017; 5:202-211.e3. [PMID: 28957654 PMCID: PMC5637520 DOI: 10.1016/j.cels.2017.09.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/01/2017] [Accepted: 08/29/2017] [Indexed: 01/02/2023]
Abstract
Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here, we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-MPs and then predicts 3D structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs, and generates 3D models with root-mean-square deviation (RMSD) less than 4 and 5 Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation project shows that our method predicted high-resolution 3D models for two recent test MPs of 210 residues with RMSD ∼2 Å. We estimated that our method could predict correct folds for 1,345-1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Computer Science, University of Hong Kong, Hong Kong
| | - Yizhou Yu
- Department of Computer Science, University of Hong Kong, Hong Kong
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
64
|
Sabzekar M, Naghibzadeh M, Eghdami M, Aydin Z. Protein β-sheet prediction using an efficient dynamic programming algorithm. Comput Biol Chem 2017; 70:142-155. [PMID: 28881217 DOI: 10.1016/j.compbiolchem.2017.08.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 07/25/2017] [Accepted: 08/18/2017] [Indexed: 11/28/2022]
Abstract
Predicting the β-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in β-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all β-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of β-strands. Additionally, brute-force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate β-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict β-sheet structures with high number of β-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art β-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar.
Collapse
Affiliation(s)
- Mostafa Sabzekar
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahdie Eghdami
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Zafer Aydin
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
65
|
Wang S, Sun S, Xu J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 2017; 86 Suppl 1:67-77. [PMID: 28845538 DOI: 10.1002/prot.25377] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 08/18/2017] [Accepted: 08/25/2017] [Indexed: 11/08/2022]
Abstract
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Siqi Sun
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
66
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
67
|
Wang S, Ma J, Xu J. AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2017; 32:i672-i679. [PMID: 27587688 DOI: 10.1093/bioinformatics/btw446] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. METHOD This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data. RESULTS Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others. AVAILABILITY AND IMPLEMENTATION http://raptorx2.uchicago.edu/StructurePropertyPred/predict/ CONTACT wangsheng@uchicago.edu, jinboxu@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
68
|
Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017; 33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
69
|
Simkovic F, Ovchinnikov S, Baker D, Rigden DJ. Applications of contact predictions to structural biology. IUCRJ 2017; 4:291-300. [PMID: 28512576 PMCID: PMC5414403 DOI: 10.1107/s2052252517005115] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/03/2017] [Indexed: 06/07/2023]
Abstract
Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallo-graphy, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
70
|
Chapman SD, Adami C, Wilke CO, B Kc D. The evolution of logic circuits for the purpose of protein contact map prediction. PeerJ 2017; 5:e3139. [PMID: 28439455 PMCID: PMC5398280 DOI: 10.7717/peerj.3139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 03/02/2017] [Indexed: 11/20/2022] Open
Abstract
Predicting protein structure from sequence remains a major open problem in protein biochemistry. One component of predicting complete structures is the prediction of inter-residue contact patterns (contact maps). Here, we discuss protein contact map prediction by machine learning. We describe a novel method for contact map prediction that uses the evolution of logic circuits. These logic circuits operate on feature data and output whether or not two amino acids in a protein are in contact or not. We show that such a method is feasible, and in addition that evolution allows the logic circuits to be trained on the dataset in an unbiased manner so that it can be used in both contact map prediction and the selection of relevant features in a dataset.
Collapse
Affiliation(s)
- Samuel D Chapman
- Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Christoph Adami
- Department of Microbiology and Molecular Genetics and Department of Physics and Astronomy, Michigan State University, East Lansing, MI, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Dukka B Kc
- Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| |
Collapse
|
71
|
Huang W, Zeng X, Shi Y, Liu M. Functional characterization of human equilibrative nucleoside transporter 1. Protein Cell 2017; 8:284-295. [PMID: 27995448 PMCID: PMC5359181 DOI: 10.1007/s13238-016-0350-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 11/04/2016] [Indexed: 12/15/2022] Open
Abstract
Equilibrative nucleoside transporters (ENTs), which facilitate cross-membrane transport of nucleosides and nucleoside-derived drugs, play an important role in the salvage pathways of nucleotide synthesis, cancer chemotherapy, and treatment for virus infections. Functional characterization of ENTs at the molecular level remains technically challenging and hence scant. In this study, we report successful purification and biochemical characterization of human equilibrative nucleoside transporter 1 (hENT1) in vitro. The HEK293F-derived, recombinant hENT1 is homogenous and functionally active in proteoliposome-based counter flow assays. hENT1 transports the substrate adenosine with a Km of 215 ± 34 µmol/L and a Vmax of 578 ± 23.4 nmol mg-1 min-1. Adenosine uptake by hENT1 is competitively inhibited by nitrobenzylmercaptopurine ribonucleoside (NBMPR), nucleosides, deoxynucleosides, and nucleoside-derived anti-cancer and anti-viral drugs. Binding of hENT1 to adenosine, deoxyadenosine, and adenine by isothermal titration calorimetry is in general agreement with results of the competitive inhibition assays. These results validate hENT1 as a bona fide target for potential drug target and serve as a useful basis for future biophysical and structural studies.
Collapse
Affiliation(s)
- Weiyun Huang
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xin Zeng
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yigong Shi
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Minhao Liu
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
72
|
Castelli M, Clementi N, Pfaff J, Sautto GA, Diotti RA, Burioni R, Doranz BJ, Dal Peraro M, Clementi M, Mancini N. A Biologically-validated HCV E1E2 Heterodimer Structural Model. Sci Rep 2017; 7:214. [PMID: 28303031 PMCID: PMC5428263 DOI: 10.1038/s41598-017-00320-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 02/21/2017] [Indexed: 12/14/2022] Open
Abstract
The design of vaccine strategies and the development of drugs targeting the early stages of Hepatitis C virus (HCV) infection are hampered by the lack of structural information about its surface glycoproteins E1 and E2, the two constituents of HCV entry machinery. Despite the recent crystal resolution of limited versions of both proteins in truncated form, a complete picture of the E1E2 complex is still missing. Here we combined deep computational analysis of E1E2 secondary, tertiary and quaternary structure with functional and immunological mutational analysis across E1E2 in order to propose an in silico model for the ectodomain of the E1E2 heterodimer. Our model describes E1-E2 ectodomain dimerization interfaces, provides a structural explanation of E1 and E2 immunogenicity and sheds light on the molecular processes and disulfide bridges isomerization underlying the conformational changes required for fusion. Comprehensive alanine mutational analysis across 553 residues of E1E2 also resulted in identifying the epitope maps of diverse mAbs and the disulfide connectivity underlying E1E2 native conformation. The predicted structure unveils E1 and E2 structures in complex, thus representing a step towards the rational design of immunogens and drugs inhibiting HCV entry.
Collapse
Affiliation(s)
- Matteo Castelli
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy
| | - Nicola Clementi
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy
| | - Jennifer Pfaff
- Integral Molecular, 3711 Market St #900, Philadelphia, PA, 19104, USA
| | - Giuseppe A Sautto
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy
| | - Roberta A Diotti
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy
| | - Roberto Burioni
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy
| | - Benjamin J Doranz
- Integral Molecular, 3711 Market St #900, Philadelphia, PA, 19104, USA
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Route Cantonale, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Massimo Clementi
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy
| | - Nicasio Mancini
- Laboratory of Microbiology and Virology, Università "Vita-Salute" San Raffaele, Via Olgettina 58, 20132, Milano, Italy.
| |
Collapse
|
73
|
Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 2017; 13:e1005324. [PMID: 28056090 PMCID: PMC5249242 DOI: 10.1371/journal.pcbi.1005324] [Citation(s) in RCA: 589] [Impact Index Per Article: 73.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/20/2017] [Accepted: 12/20/2016] [Indexed: 12/02/2022] Open
Abstract
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ Protein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurrence patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained mostly with soluble proteins, our method performs very well on membrane proteins. Recent blind CAMEO test confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Siqi Sun
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Renyu Zhang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
74
|
Rawi R, Mall R, Kunji K, El Anbari M, Aupetit M, Ullah E, Bensmail H. COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator. BMC Bioinformatics 2016; 17:533. [PMID: 27978812 PMCID: PMC5159955 DOI: 10.1186/s12859-016-1400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/01/2016] [Indexed: 11/13/2022] Open
Abstract
Background The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. Results Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. Conclusion We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Reda Rawi
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.
| | - Raghvendra Mall
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Khalid Kunji
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Mohammed El Anbari
- Division of Biomedical Informatics, Sidra Medical and Research Center, Doha, Qatar
| | - Michael Aupetit
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ehsan Ullah
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Computational Science and Engineering, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
75
|
Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016; 43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]
Abstract
Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function maintained through evolution. We review recent work with Potts models to predict protein structure and sequence-dependent conformational free energy landscapes, to survey protein fitness landscapes and to explore the effects of epistasis on fitness. We also comment on the numerical methods used to infer these models for each application.
Collapse
Affiliation(s)
- Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
| |
Collapse
|
76
|
Du T, Liao L, Wu CH. Enhancing interacting residue prediction with integrated contact matrix prediction in protein-protein interaction. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:17. [PMID: 27818677 PMCID: PMC5075339 DOI: 10.1186/s13637-016-0051-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Accepted: 09/25/2016] [Indexed: 11/10/2022]
Abstract
Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact matrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of protein-protein interaction. Various methods for solving these two problems have been reported in the literature. However, the interacting residue prediction and contact matrix prediction were handled by and large independently in those existing methods, though intuitively good prediction of interacting residues will help with predicting the contact matrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction profile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM interaction residue prediction. We propose to leverage what is learned from the contact matrix prediction and utilize the predicted contact matrix as "feedback" to enhance the interaction residue prediction. The CM-ipHMM model showed significant improvement over the previous method that uses the ipHMM for predicting interaction residues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction.
Collapse
Affiliation(s)
- Tianchuan Du
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| | - Cathy H Wu
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| |
Collapse
|
77
|
Li Q, Dahl DB, Vannucci M, Joo H, Tsai JW. KScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure. Bioinformatics 2016; 32:3774-3781. [PMID: 27559156 DOI: 10.1093/bioinformatics/btw553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 07/15/2016] [Accepted: 08/18/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a socket and knob, we undertake an investigation of the knob-socket construct's ability to improve the prediction of residue contacts. The statistical model considers three priors and two posterior estimations to better understand how the input data affects predictions. This produces six implementations of KScons that are tested on three sets: PSICOV, CASP10 and CASP11. We compare against the current leading contact prediction methods. RESULTS The results demonstrate the usefulness as well as the limits of knob-socket based structural modeling of protein contacts. The construct is able to extract good predictions from known structural homologs, while its performance degrades when no homologs exist. Among our six implementations, KScons MST-MP (which uses the multiple structure alignment prior and marginal posterior incorporating structural homolog information) performs the best in all three prediction sets. An analysis of recall and precision finds that KScons MST-MP improves accuracy not only by improving identification of true positives, but also by decreasing the number of false positives. Over the CASP10 and CASP11 sets, KScons MST-MP performs better than the leading methods using only evolutionary coupling data, but not quite as well as the supervised learning methods of MetaPSICOV and CoinDCA-NN that incorporate a large set of structural features. CONTACT qiwei.li@rice.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiwei Li
- Department of Statistics, Rice University, Houston, TX, USA
| | - David B Dahl
- Department of Statistics, Brigham Young University, Provo, UT, USA
| | | | - Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, CA, USA
| | - Jerry W Tsai
- Department of Chemistry, University of the Pacific, Stockton, CA, USA
| |
Collapse
|
78
|
Simkovic F, Thomas JMH, Keegan RM, Winn MD, Mayans O, Rigden DJ. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds. IUCRJ 2016; 3:259-70. [PMID: 27437113 PMCID: PMC4937781 DOI: 10.1107/s2052252516008113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/18/2016] [Indexed: 05/05/2023]
Abstract
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Jens M. H. Thomas
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Ronan M. Keegan
- Research Complex at Harwell, STFC Rutherford Appleton Laboratory, Didcot OX11 0FA, England
| | - Martyn D. Winn
- Science and Technology Facilities Council, Daresbury Laboratory, Warrington WA4 4AD, England
| | - Olga Mayans
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
79
|
Wang S, Li W, Zhang R, Liu S, Xu J. CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res 2016; 44:W361-6. [PMID: 27112569 PMCID: PMC4987891 DOI: 10.1093/nar/gkw307] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/12/2016] [Indexed: 12/14/2022] Open
Abstract
CoinFold (http://raptorx2.uchicago.edu/ContactMap/) is a web server for protein contact prediction and contact-assisted de novo structure prediction. CoinFold predicts contacts by integrating joint multi-family evolutionary coupling (EC) analysis and supervised machine learning. This joint EC analysis is unique in that it not only uses residue coevolution information in the target protein family, but also that in the related families which may have divergent sequences but similar folds. The supervised learning further improves contact prediction accuracy by making use of sequence profile, contact (distance) potential and other information. Finally, this server predicts tertiary structure of a sequence by feeding its predicted contacts and secondary structure to the CNS suite. Tested on the CASP and CAMEO targets, this server shows significant advantages over existing ones of similar category in both contact and tertiary structure prediction.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Wei Li
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Renyu Zhang
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Shiwang Liu
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
80
|
Schneider M, Belsom A, Rappsilber J, Brock O. Blind testing of cross-linking/mass spectrometry hybrid methods in CASP11. Proteins 2016; 84 Suppl 1:152-63. [PMID: 26945814 PMCID: PMC5042049 DOI: 10.1002/prot.25028] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 02/09/2016] [Accepted: 02/27/2016] [Indexed: 12/26/2022]
Abstract
Hybrid approaches combine computational methods with experimental data. The information contained in the experimental data can be leveraged to probe the structure of proteins otherwise elusive to computational methods. Compared with computational methods, the structures produced by hybrid methods exhibit some degree of experimental validation. In spite of these advantages, most hybrid methods have not yet been validated in blind tests, hampering their development. Here, we describe the first blind test of a specific cross-link based hybrid method in CASP. This blind test was coordinated by the CASP organizers and utilized a novel, high-density cross-linking/mass-spectrometry (CLMS) approach that is able to collect high-density CLMS data in a matter of days. This experimental protocol was developed in the Rappsilber laboratory. This approach exploits the chemistry of a highly reactive, photoactivatable cross-linker to produce an order of magnitude more cross-links than homobifunctional cross-linkers. The Rappsilber laboratory generated experimental CLMS data based on this protocol, submitted the data to the CASP organizers which then released this data to the CASP11 prediction groups in a separate, CLMS assisted modeling experiment. We did not observe a clear improvement of assisted models, presumably because the properties of the CLMS data-uncertainty in cross-link identification and residue-residue assignment, and uneven distribution over the protein-were largely unknown to the prediction groups and their approaches were not yet tailored to this kind of data. We also suggest modifications to the CLMS-CASP experiment and discuss the importance of rigorous blind testing in the development of hybrid methods. Proteins 2016; 84(Suppl 1):152-163. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Michael Schneider
- Robotics and Biology Laboratory, Technische Universität Berlin, 10587, Berlin, Germany
| | - Adam Belsom
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, United Kingdom
| | - Juri Rappsilber
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, United Kingdom. .,Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355, Berlin, Germany.
| | - Oliver Brock
- Robotics and Biology Laboratory, Technische Universität Berlin, 10587, Berlin, Germany.
| |
Collapse
|