1
|
Cai J, Wang T, Deng X, Tang L, Liu L. GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning. BMC Genomics 2023; 24:52. [PMID: 36709266 PMCID: PMC9883864 DOI: 10.1186/s12864-022-09034-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 11/21/2022] [Indexed: 01/29/2023] Open
Abstract
In recent years, a large number of studies have shown that the subcellular localization of long non-coding RNAs (lncRNAs) can bring crucial information to the recognition of lncRNAs function. Therefore, it is of great significance to establish a computational method to accurately predict the subcellular localization of lncRNA. Previous prediction models are based on low-level sequences information and are troubled by the few samples problem. In this study, we propose a new prediction model, GM-lncLoc, which is based on the initial information extracted from the lncRNA sequence, and also combines the graph structure information to extract high level features of lncRNA. In addition, the training mode of meta-learning is introduced to obtain meta-parameters by training a series of tasks. With the meta-parameters, the final parameters of other similar tasks can be learned quickly, so as to solve the problem of few samples in lncRNA subcellular localization. Compared with the previous methods, GM-lncLoc achieved the best results with an accuracy of 93.4 and 94.2% in the benchmark datasets of 5 and 4 subcellular compartments, respectively. Furthermore, the prediction performance of GM-lncLoc was also better on the independent dataset. It shows the effectiveness and great potential of our proposed method for lncRNA subcellular localization prediction. The datasets and source code are freely available at https://github.com/JunzheCai/GM-lncLoc .
Collapse
Affiliation(s)
- Junzhe Cai
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| | - Ting Wang
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| | - Xi Deng
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| | - Lin Tang
- grid.410739.80000 0001 0723 6903Key Laboratory of Educational Information for Nationalities Ministry of Education, Yunnan Normal University, Kunming, Yunnan China
| | - Lin Liu
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| |
Collapse
|
2
|
Gaur NK, Goyal VD, Kulkarni K, Makde RD. Machine learning classifiers aid virtual screening for efficient design of mini-protein therapeutics. Bioorg Med Chem Lett 2021; 38:127852. [PMID: 33609660 DOI: 10.1016/j.bmcl.2021.127852] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/01/2021] [Accepted: 02/05/2021] [Indexed: 11/15/2022]
Abstract
De novo design of mini-proteins (4-12 kDa) has recently been shown to produce new candidates for protein therapeutics. They are temperature stable molecules that bind to the drug target with high affinity for inhibiting its interactions. The development of mini-protein binders requires laboratory screening of tens of thousands of molecules for effective target binding. In this study we trained machine learning classifiers which can distinguish, with 90% accuracy and 80% precision, mini-protein binders from non-binding molecules designed for a particular target; this significantly reduces the number of mini protein candidates for experimental screening. Further, on the basis of our results we propose a multi-stage protocol where a small dataset (few hundred experimentally verified target-specific mini-proteins) can be used to train classifiers for improving the efficiency of mini-protein design for any specific target.
Collapse
Affiliation(s)
- Neeraj K Gaur
- Beamline Development and Application Section, Bhabha Atomic Research Centre, Mumbai 400085, India; Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune 411008, India.
| | - Venuka Durani Goyal
- Beamline Development and Application Section, Bhabha Atomic Research Centre, Mumbai 400085, India
| | - Kiran Kulkarni
- Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune 411008, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Ravindra D Makde
- Beamline Development and Application Section, Bhabha Atomic Research Centre, Mumbai 400085, India
| |
Collapse
|
3
|
Zhang Q, Zhu J, Ju F, Kong L, Sun S, Zheng WM, Bu D. ISSEC: inferring contacts among protein secondary structure elements using deep object detection. BMC Bioinformatics 2020; 21:503. [PMID: 33153432 PMCID: PMC7643357 DOI: 10.1186/s12859-020-03793-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 09/30/2020] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The formation of contacts among protein secondary structure elements (SSEs) is an important step in protein folding as it determines topology of protein tertiary structure; hence, inferring inter-SSE contacts is crucial to protein structure prediction. One of the existing strategies infers inter-SSE contacts directly from the predicted possibilities of inter-residue contacts without any preprocessing, and thus suffers from the excessive noises existing in the predicted inter-residue contacts. Another strategy defines SSEs based on protein secondary structure prediction first, and then judges whether each candidate SSE pair could form contact or not. However, it is difficult to accurately determine boundary of SSEs due to the errors in secondary structure prediction. The incorrectly-deduced SSEs definitely hinder subsequent prediction of the contacts among them. RESULTS We here report an accurate approach to infer the inter-SSE contacts (thus called as ISSEC) using the deep object detection technique. The design of ISSEC is based on the observation that, in the inter-residue contact map, the contacting SSEs usually form rectangle regions with characteristic patterns. Therefore, ISSEC infers inter-SSE contacts through detecting such rectangle regions. Unlike the existing approach directly using the predicted probabilities of inter-residue contact, ISSEC applies the deep convolution technique to extract high-level features from the inter-residue contacts. More importantly, ISSEC does not rely on the pre-defined SSEs. Instead, ISSEC enumerates multiple candidate rectangle regions in the predicted inter-residue contact map, and for each region, ISSEC calculates a confidence score to measure whether it has characteristic patterns or not. ISSEC employs greedy strategy to select non-overlapping regions with high confidence score, and finally infers inter-SSE contacts according to these regions. CONCLUSIONS Comprehensive experimental results suggested that ISSEC outperformed the state-of-the-art approaches in predicting inter-SSE contacts. We further demonstrated the successful applications of ISSEC to improve prediction of both inter-residue contacts and tertiary structure as well.
Collapse
Affiliation(s)
- Qi Zhang
- Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
- School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
- School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
- School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
- School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
- School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, 100190, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
- School of Computer Science, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
4
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
5
|
Sun J, Frishman D. DeepHelicon: Accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks. J Struct Biol 2020; 212:107574. [PMID: 32663598 DOI: 10.1016/j.jsb.2020.107574] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 01/16/2023]
Abstract
Accurate prediction of amino acid residue contacts is an important prerequisite for generating high-quality 3D models of transmembrane (TM) proteins. While a large number of compositional, evolutionary, and structural properties of proteins can be used to train contact prediction methods, recent research suggests that coevolution between residues provides the strongest indication of their spatial proximity. We have developed a deep learning approach, DeepHelicon, to predict inter-helical residue contacts in TM proteins by considering only coevolutionary features. DeepHelicon comprises a two-stage supervised learning process by residual neural networks for a gradual refinement of contact maps, followed by variance reduction by an ensemble of models. We present a benchmark study of 12 contact predictors and conclude that DeepHelicon together with the two other state-of-the-art methods DeepMetaPSICOV and Membrain2 outperforms the 10 remaining algorithms on all datasets and at all settings. On a set of 44 TM proteins with an average length of 388 residues DeepHelicon achieves the best performance among all benchmarked methods in predicting the top L/5 and L/2 inter-helical contacts, with the mean precision of 87.42% and 77.84%, respectively. On a set of 57 relatively small TM proteins with an average length of 298 residues DeepHelicon ranks second best after DeepMetaPSICOV. DeepHelicon produces the most accurate predictions for large proteins with more than 10 transmembrane helices. Coevolutionary features alone allow to predict inter-helical residue contacts with an accuracy sufficient for generating acceptable 3D models for up to 30% of proteins using a fully automated modeling method such as CONFOLD2.
Collapse
Affiliation(s)
- Jianfeng Sun
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany.
| |
Collapse
|
6
|
Feng SH, Zhang WX, Yang J, Yang Y, Shen HB. Topology Prediction Improvement of α-helical Transmembrane Proteins Through Helix-tail Modeling and Multiscale Deep Learning Fusion. J Mol Biol 2020; 432:1279-1296. [DOI: 10.1016/j.jmb.2019.12.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 12/02/2019] [Accepted: 12/04/2019] [Indexed: 12/18/2022]
|
7
|
Cao Z, Pan X, Yang Y, Huang Y, Shen HB. The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2019; 34:2185-2194. [PMID: 29462250 DOI: 10.1093/bioinformatics/bty085] [Citation(s) in RCA: 287] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 02/14/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Cao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
8
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
9
|
Oriented covalent immobilization of recombinant protein A on the glutaraldehyde activated agarose support. Int J Biol Macromol 2018; 120:100-108. [DOI: 10.1016/j.ijbiomac.2018.08.074] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 08/11/2018] [Accepted: 08/15/2018] [Indexed: 12/21/2022]
|
10
|
Zhang X, Duan Y, Han N, Wu Y. Increase in IgG-binding Capacity of Recombinant Protein a Immobilized on Heterofunctional Amino and Epoxy Agarose. ACTA ACUST UNITED AC 2018. [DOI: 10.1088/1757-899x/381/1/012042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
11
|
Nikolaev D, Shtyrov AA, Panov MS, Jamal A, Chakchir OB, Kochemirovsky VA, Olivucci M, Ryazantsev MN. A Comparative Study of Modern Homology Modeling Algorithms for Rhodopsin Structure Prediction. ACS OMEGA 2018; 3:7555-7566. [PMID: 30087916 PMCID: PMC6068592 DOI: 10.1021/acsomega.8b00721] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 06/21/2018] [Indexed: 06/08/2023]
Abstract
Rhodopsins are seven α-helical membrane proteins that are of great importance in chemistry, biology, and modern biotechnology. Any in silico study on rhodopsin properties and functioning requires a high-quality three-dimensional structure. Due to particular difficulties with obtaining membrane protein structures from the experiment, in silico prediction of the three-dimensional rhodopsin structure based only on its primary sequence is an especially important task. For the last few years, significant progress was made in the field of protein structure prediction, especially for methods based on comparative modeling. However, the majority of this progress was made for soluble proteins and further investigations are needed to achieve similar progress for membrane proteins. In this paper, we evaluate the performance of modern protein structure prediction methodologies (implemented in the Medeller, I-TASSER, and Rosetta packages) for their ability to predict rhodopsin structures. Three widely used methodologies were considered: two general methodologies that are commonly applied to soluble proteins and a methodology that uses constraints that are specific for membrane proteins. The test pool consisted of 36 target-template pairs with different sequence similarities that was constructed on the basis of 24 experimental rhodopsin structures taken from the RCSB database. As a result, we showed that all three considered methodologies allow obtaining rhodopsin structures with the quality that is close to the crystallographic one (root mean square deviation (RMSD) of the predicted structure from the corresponding X-ray structure up to 1.5 Å) if the target-template sequence identity is higher than 40%. Moreover, all considered methodologies provided structures of average quality (RMSD < 4.0 Å) if the target-template sequence identity is higher than 20%. Such structures can be subsequently used for further investigation of molecular mechanisms of protein functioning and for the development of modern protein-based biotechnologies.
Collapse
Affiliation(s)
- Dmitrii
M. Nikolaev
- Nanotechnology
Research and Education Centre RAS, Saint-Petersburg
Academic University, 8/3 Khlopina Street, St. Petersburg 194021, Russia
| | - Andrey A. Shtyrov
- Nanotechnology
Research and Education Centre RAS, Saint-Petersburg
Academic University, 8/3 Khlopina Street, St. Petersburg 194021, Russia
| | - Maxim S. Panov
- Institute
of Chemistry, Saint Petersburg State University, 7/9 Universitetskaya emb., St. Petersburg 199034, Russia
| | - Adeel Jamal
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Oleg B. Chakchir
- Nanotechnology
Research and Education Centre RAS, Saint-Petersburg
Academic University, 8/3 Khlopina Street, St. Petersburg 194021, Russia
| | - Vladimir A. Kochemirovsky
- Institute
of Chemistry, Saint Petersburg State University, 7/9 Universitetskaya emb., St. Petersburg 199034, Russia
| | - Massimo Olivucci
- Department
of Biotechnology, Chemistry and Pharmacy, Università di Siena, via A. Moro 2, Siena I-53100, Italy
| | - Mikhail N. Ryazantsev
- Institute
of Chemistry, Saint Petersburg State University, 7/9 Universitetskaya emb., St. Petersburg 199034, Russia
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences, 31 Bolshoy pr., St. Petersburg 199004, Russia
| |
Collapse
|
12
|
Yin X, Yang J, Xiao F, Yang Y, Shen HB. MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction. NANO-MICRO LETTERS 2018; 10:2. [PMID: 30393651 PMCID: PMC6199043 DOI: 10.1007/s40820-017-0156-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 08/26/2017] [Indexed: 05/12/2023]
Abstract
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/.
Collapse
Affiliation(s)
- Xi Yin
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China
| | - Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China
| | - Feng Xiao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, 200240, People's Republic of China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China.
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People's Republic of China.
| |
Collapse
|
13
|
Thangappan J, Madan B, Wu S, Lee SG. Measuring the Conformational Distance of GPCR-related Proteins Using a Joint-based Descriptor. Sci Rep 2017; 7:15205. [PMID: 29123217 PMCID: PMC5680341 DOI: 10.1038/s41598-017-15513-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 10/27/2017] [Indexed: 01/19/2023] Open
Abstract
Joint-based descriptor is a new level of macroscopic descriptor for protein structure using joints of secondary structures as a basic element. Here, we propose how the joint-based descriptor can be applied to examine the conformational distances or differences of transmembrane (TM) proteins. Specifically, we performed three independent studies that measured the global and conformational distances between GPCR A family and its related structures. First, the conformational distances of GPCR A family and other 7TM proteins were evaluated. This provided the information on the distant and close families or superfamilies to GPCR A family and permitted the identification of conserved local conformations. Second, computational models of GPCR A family proteins were validated, which enabled us to estimate how much they reproduce the native conformation of GPCR A proteins at global and local conformational level. Finally, the conformational distances between active and inactive states of GPCR proteins were estimated, which identified the difference of local conformation. The proposed macroscopic joint-based approach is expected to allow us to investigate structural features, evolutionary relationships, computational models and conformational changes of TM proteins in a more simplistic manner.
Collapse
Affiliation(s)
- Jayaraman Thangappan
- Department of Chemical Engineering, Pusan National University, Busan, 609-735, Republic of Korea
| | - Bharat Madan
- Department of Chemical Engineering, Pusan National University, Busan, 609-735, Republic of Korea
| | - Sangwook Wu
- Department of Physics, Pukyong National University, Busan, 608-737, Republic of Korea.
| | - Sun-Gu Lee
- Department of Chemical Engineering, Pusan National University, Busan, 609-735, Republic of Korea.
| |
Collapse
|
14
|
Wang S, Li Z, Yu Y, Xu J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst 2017; 5:202-211.e3. [PMID: 28957654 PMCID: PMC5637520 DOI: 10.1016/j.cels.2017.09.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/01/2017] [Accepted: 08/29/2017] [Indexed: 01/02/2023]
Abstract
Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here, we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-MPs and then predicts 3D structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs, and generates 3D models with root-mean-square deviation (RMSD) less than 4 and 5 Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation project shows that our method predicted high-resolution 3D models for two recent test MPs of 210 residues with RMSD ∼2 Å. We estimated that our method could predict correct folds for 1,345-1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Computer Science, University of Hong Kong, Hong Kong
| | - Yizhou Yu
- Department of Computer Science, University of Hong Kong, Hong Kong
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
15
|
Wu H, Wang K, Lu L, Xue Y, Lyu Q, Jiang M. Deep Conditional Random Field Approach to Transmembrane Topology Prediction and Application to GPCR Three-Dimensional Structure Modeling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1106-1114. [PMID: 27576262 DOI: 10.1109/tcbb.2016.2602872] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Transmembrane proteins play important roles in cellular energy production, signal transmission, and metabolism. Many shallow machine learning methods have been applied to transmembrane topology prediction, but the performance was limited by the large size of membrane proteins and the complex biological evolution information behind the sequence. In this paper, we proposed a novel deep approach based on conditional random fields named as dCRF-TM for predicting the topology of transmembrane proteins. Conditional random fields take into account more complicated interrelation between residue labels in full-length sequence than HMM and SVM-based methods. Three widely-used datasets were employed in the benchmark. DCRF-TM had the accuracy 95 percent over helix location prediction and the accuracy 78 percent over helix number prediction. DCRF-TM demonstrated a more robust performance on large size proteins (>350 residues) against 11 state-of-the-art predictors. Further dCRF-TM was applied to ab initio modeling three-dimensional structures of seven-transmembrane receptors, also known as G protein-coupled receptors. The predictions on 24 solved G protein-coupled receptors and unsolved vasopressin V2 receptor illustrated that dCRF-TM helped abGPCR-I-TASSER to improve TM-score 34.3 percent rather than using the random transmembrane definition. Two out of five predicted models caught the experimental verified disulfide bonds in vasopressin V2 receptor.
Collapse
|
16
|
Xiong D, Mao W, Gong H. Predicting the helix-helix interactions from correlated residue mutations. Proteins 2017; 85:2162-2169. [PMID: 28833538 DOI: 10.1002/prot.25370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 08/03/2017] [Accepted: 08/13/2017] [Indexed: 12/30/2022]
Abstract
Helix-helix interactions are crucial in the structure assembly, stability and function of helix-rich proteins including many membrane proteins. In spite of remarkable progresses over the past decades, the accuracy of predicting protein structures from their amino acid sequences is still far from satisfaction. In this work, we focused on a simpler problem, the prediction of helix-helix interactions, the results of which could facilitate practical protein structure prediction by constraining the sampling space. Specifically, we started from the noisy 2D residue contact maps derived from correlated residue mutations, and utilized ridge detection to identify the characteristic residue contact patterns for helix-helix interactions. The ridge information as well as a few additional features were then fed into a machine learning model HHConPred to predict interactions between helix pairs. In an independent test, our method achieved an F-measure of ∼60% for predicting helix-helix interactions. Moreover, although the model was trained mainly using soluble proteins, it could be extended to membrane proteins with at least comparable performance relatively to previous approaches that were generated purely using membrane proteins. All data and source codes are available at http://166.111.152.91/Downloads.html or https://github.com/dpxiong/HHConPred.
Collapse
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
17
|
Abstract
Co-evolution techniques were originally conceived to assist in protein structure prediction by inferring pairs of residues that share spatial proximity. However, the functional relationships that can be extrapolated from co-evolution have also proven to be useful in a wide array of structural bioinformatics applications. These techniques are a powerful way to extract structural and functional information in a sequence-rich world.
Collapse
|
18
|
Stahl K, Schneider M, Brock O. EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinformatics 2017; 18:303. [PMID: 28623886 PMCID: PMC5474060 DOI: 10.1186/s12859-017-1713-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/.
Collapse
Affiliation(s)
- Kolja Stahl
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| |
Collapse
|
19
|
Characterization of Ghrelin O-Acyltransferase (GOAT) in goldfish (Carassius auratus). PLoS One 2017; 12:e0171874. [PMID: 28178327 PMCID: PMC5298278 DOI: 10.1371/journal.pone.0171874] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/26/2017] [Indexed: 12/21/2022] Open
Abstract
Ghrelin is the only known hormone posttranslationally modified with an acylation. This modification is crucial for most of ghrelin’s physiological effects and is catalyzed by the polytopic enzyme ghrelin O-acyltransferase (GOAT). The aim of this study was to characterize GOAT in a teleost model, goldfish (Carassius auratus). First, the full-length cDNA sequence was obtained by RT-PCR and rapid amplification of cDNA ends methods. Two highly homologous cDNAs of 1491 and 1413 bp, respectively, named goat-V1 and goat-V2 were identified. Deduced protein sequences (393 and 367 amino acids, respectively) are predicted to present 11 and 9 transmembrane regions, respectively, and both contain two conserved key residues proposed to be involved in catalysis: asparagine 273 and histidine 304. RT-qPCR revealed that both forms of goat mRNAs show a similar widespread tissue distribution, with the highest expression in the gastrointestinal tract and gonads and less but considerable expression in brain, pituitary, liver and adipose tissue. Immunostaining of intestinal sections showed the presence of GOAT immunoreactive cells in the intestinal mucosa, some of which colocalize with ghrelin. Using an in vitro approach, we observed that acylated ghrelin downregulates GOAT gene and protein levels in cultured intestine in a time-dependent manner. Finally, we found a rhythmic oscillation of goat mRNA expression in the hypothalamus, pituitary and intestinal bulb of goldfish fed at midday, but not at midnight. Together, these findings report novel data characterizing GOAT, and offer new information about the ghrelinergic system in fish.
Collapse
|
20
|
Venko K, Roy Choudhury A, Novič M. Computational Approaches for Revealing the Structure of Membrane Transporters: Case Study on Bilitranslocase. Comput Struct Biotechnol J 2017; 15:232-242. [PMID: 28228927 PMCID: PMC5312651 DOI: 10.1016/j.csbj.2017.01.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Revised: 01/19/2017] [Accepted: 01/20/2017] [Indexed: 11/23/2022] Open
Abstract
The structural and functional details of transmembrane proteins are vastly underexplored, mostly due to experimental difficulties regarding their solubility and stability. Currently, the majority of transmembrane protein structures are still unknown and this present a huge experimental and computational challenge. Nowadays, thanks to X-ray crystallography or NMR spectroscopy over 3000 structures of membrane proteins have been solved, among them only a few hundred unique ones. Due to the vast biological and pharmaceutical interest in the elucidation of the structure and the functional mechanisms of transmembrane proteins, several computational methods have been developed to overcome the experimental gap. If combined with experimental data the computational information enables rapid, low cost and successful predictions of the molecular structure of unsolved proteins. The reliability of the predictions depends on the availability and accuracy of experimental data associated with structural information. In this review, the following methods are proposed for in silico structure elucidation: sequence-dependent predictions of transmembrane regions, predictions of transmembrane helix–helix interactions, helix arrangements in membrane models, and testing their stability with molecular dynamics simulations. We also demonstrate the usage of the computational methods listed above by proposing a model for the molecular structure of the transmembrane protein bilitranslocase. Bilitranslocase is bilirubin membrane transporter, which shares similar tissue distribution and functional properties with some of the members of the Organic Anion Transporter family and is the only member classified in the Bilirubin Transporter Family. Regarding its unique properties, bilitranslocase is a potentially interesting drug target.
Collapse
Affiliation(s)
- Katja Venko
- Department of Cheminformatics, National Institute of Chemistry, Ljubljana, Slovenia
| | - A Roy Choudhury
- Department of Cheminformatics, National Institute of Chemistry, Ljubljana, Slovenia
| | - Marjana Novič
- Department of Cheminformatics, National Institute of Chemistry, Ljubljana, Slovenia
| |
Collapse
|
21
|
Simkovic F, Thomas JMH, Keegan RM, Winn MD, Mayans O, Rigden DJ. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds. IUCRJ 2016; 3:259-70. [PMID: 27437113 PMCID: PMC4937781 DOI: 10.1107/s2052252516008113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/18/2016] [Indexed: 05/05/2023]
Abstract
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Jens M. H. Thomas
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Ronan M. Keegan
- Research Complex at Harwell, STFC Rutherford Appleton Laboratory, Didcot OX11 0FA, England
| | - Martyn D. Winn
- Science and Technology Facilities Council, Daresbury Laboratory, Warrington WA4 4AD, England
| | - Olga Mayans
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
22
|
Yang J, Jin QY, Zhang B, Shen HB. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter. ACTA ACUST UNITED AC 2016; 32:2435-43. [PMID: 27153618 DOI: 10.1093/bioinformatics/btw181] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 04/03/2016] [Indexed: 11/12/2022]
Abstract
MOTIVATION Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. RESULTS In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Qi-Yu Jin
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Biao Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
23
|
Morrill GA, Kostellow AB, Liu L, Gupta RK, Askari A. Evolution of the α-Subunit of Na/K-ATPase from Paramecium to Homo sapiens: Invariance of Transmembrane Helix Topology. J Mol Evol 2016; 82:183-98. [PMID: 26961431 PMCID: PMC4866997 DOI: 10.1007/s00239-016-9732-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 03/03/2016] [Indexed: 12/01/2022]
Abstract
Na/K-ATPase is a key plasma membrane enzyme involved in cell signaling, volume regulation, and maintenance of electrochemical gradients. The α-subunit, central to these functions, belongs to a large family of P-type ATPases. Differences in transmembrane (TM) helix topology, sequence homology, helix–helix contacts, cell signaling, and protein domains of Na/K-ATPase α-subunit were compared in fungi (Beauveria), unicellular organisms (Paramecia), primitive multicellular organisms (Hydra), and vertebrates (Xenopus, Homo sapiens), and correlated with evolution of physiological functions in the α-subunit. All α-subunits are of similar length, with groupings of four and six helices in the N- and C-terminal regions, respectively. Minimal homology was seen for protein domain patterns in Paramecium and Hydra, with high correlation between Hydra and vertebrates. Paramecium α-subunits display extensive disorder, with minimal helix contacts. Increases in helix contacts in Hydra approached vertebrates. Protein motifs known to be associated with membrane lipid rafts and cell signaling reveal significant positional shifts between Paramecium and Hydra vulgaris, indicating that regional membrane fluidity changes occur during evolution. Putative steroid binding sites overlapping TM-3 occurred in all species. Sites associated with G-protein-receptor stimulation occur both in vertebrates and amphibia but not in Hydra or Paramecia. The C-terminus moiety “KETYY,” necessary for the Na+ activation of pump phosphorylation, is not present in unicellular species indicating the absence of classical Na+/K+-pumps. The basic protein topology evolved earliest, followed by increases in protein domains and ordered helical arrays, correlated with appearance of α-subunit regions known to involve cell signaling, membrane recycling, and ion channel formation.
Collapse
Affiliation(s)
- Gene A Morrill
- Department of Physiology and Biophysics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA.
| | - Adele B Kostellow
- Department of Physiology and Biophysics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Lijun Liu
- Department of Biochemistry and Cancer Biology, University of Toledo Health Science Campus, Toledo, OH, 43614, USA
| | - Raj K Gupta
- Department of Physiology and Biophysics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Amir Askari
- Department of Biochemistry and Cancer Biology, University of Toledo Health Science Campus, Toledo, OH, 43614, USA
| |
Collapse
|
24
|
Hönigschmid P, Frishman D. Accurate prediction of helix interactions and residue contacts in membrane proteins. J Struct Biol 2016; 194:112-23. [PMID: 26851352 DOI: 10.1016/j.jsb.2016.02.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 02/01/2016] [Accepted: 02/02/2016] [Indexed: 11/16/2022]
Abstract
Accurate prediction of intra-molecular interactions from amino acid sequence is an important pre-requisite for obtaining high-quality protein models. Over the recent years, remarkable progress in this area has been achieved through the application of novel co-variation algorithms, which eliminate transitive evolutionary connections between residues. In this work we present a new contact prediction method for α-helical transmembrane proteins, MemConP, in which evolutionary couplings are combined with a machine learning approach. MemConP achieves a substantially improved accuracy (precision: 56.0%, recall: 17.5%, MCC: 0.288) compared to the use of either machine learning or co-evolution methods alone. The method also achieves 91.4% precision, 42.1% recall and a MCC of 0.490 in predicting helix-helix interactions based on predicted contacts. The approach was trained and rigorously benchmarked by cross-validation and independent testing on up-to-date non-redundant datasets of 90 and 30 experimental three dimensional structures, respectively. MemConP is a standalone tool that can be downloaded together with the associated training data from http://webclu.bio.wzw.tum.de/MemConP.
Collapse
Affiliation(s)
- Peter Hönigschmid
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Maximus-von-Imhof Forum 3, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Maximus-von-Imhof Forum 3, 85354 Freising, Germany; Helmholtz Zentrum Munich, German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, 85764 Neuherberg, Germany; Laboratory of Bioinformatics, RASA Research Center, St Petersburg State Polytechnical University, St Petersburg 195251, Russia.
| |
Collapse
|
25
|
Zhang H, Huang Q, Bei Z, Wei Y, Floudas CA. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming. Proteins 2016; 84:332-48. [PMID: 26756402 DOI: 10.1002/prot.24979] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/19/2015] [Accepted: 12/10/2015] [Indexed: 12/28/2022]
Abstract
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/.
Collapse
Affiliation(s)
- Huiling Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingsheng Huang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Zhendong Bei
- Center for Cloud Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yanjie Wei
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Christodoulos A Floudas
- Department of Chemical Engineering, Texas A&M University, College Station, Texas, 77843.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas, 77843
| |
Collapse
|
26
|
Shneyer BI, Ušaj M, Henn A. Myo19 is an outer mitochondrial membrane motor and effector of starvation-induced filopodia. J Cell Sci 2015; 129:543-56. [PMID: 26659663 DOI: 10.1242/jcs.175349] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 12/05/2015] [Indexed: 12/13/2022] Open
Abstract
Mitochondria respond to environmental cues and stress conditions. Additionally, the disruption of the mitochondrial network dynamics and its distribution is implicated in a variety of neurodegenerative diseases. Here, we reveal a new function for Myo19 in mitochondrial dynamics and localization during the cellular response to glucose starvation. Ectopically expressed Myo19 localized with mitochondria to the tips of starvation-induced filopodia. Corollary to this, RNA interference (RNAi)-mediated knockdown of Myo19 diminished filopodia formation without evident effects on the mitochondrial network. We analyzed the Myo19-mitochondria interaction, and demonstrated that Myo19 is uniquely anchored to the outer mitochondrial membrane (OMM) through a 30-45-residue motif, indicating that Myo19 is a stably attached OMM molecular motor. Our work reveals a new function for Myo19 in mitochondrial positioning under stress.
Collapse
Affiliation(s)
- Boris I Shneyer
- Department of Biology, Technion Israel Institute of Technology, Haifa 3200003, Israel
| | - Marko Ušaj
- Department of Biology, Technion Israel Institute of Technology, Haifa 3200003, Israel
| | - Arnon Henn
- Department of Biology, Technion Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
27
|
Xiao F, Shen HB. Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors. J Chem Inf Model 2015; 55:2464-74. [DOI: 10.1021/acs.jcim.5b00246] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Feng Xiao
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
28
|
Abstract
Transmembrane (TM) helices of integral membrane proteins can facilitate strong and specific noncovalent protein-protein interactions. Mutagenesis and structural analyses have revealed numerous examples in which the interaction between TM helices of single-pass membrane proteins is dependent on a GxxxG or (small)xxx(small) motif. It is therefore tempting to use the presence of these simple motifs as an indicator of TM helix interactions. In this Current Topic review, we point out that these motifs are quite common, with more than 50% of single-pass TM domains containing a (small)xxx(small) motif. However, the actual interaction strength of motif-containing helices depends strongly on sequence context and membrane properties. In addition, recent studies have revealed several GxxxG-containing TM domains that interact via alternative interfaces involving hydrophobic, polar, aromatic, or even ionizable residues that do not form recognizable motifs. In multipass membrane proteins, GxxxG motifs can be important for protein folding, and not just oligomerization. Our current knowledge thus suggests that the presence of a GxxxG motif alone is a weak predictor of protein dimerization in the membrane.
Collapse
Affiliation(s)
- Mark G Teese
- Lehrstuhl für Chemie der Biopolymere, Technische Universität München , 85354 Freising, Germany.,Center for Integrated Protein Science Munich (CIPSM) , 81377 Munich, Germany
| | - Dieter Langosch
- Lehrstuhl für Chemie der Biopolymere, Technische Universität München , 85354 Freising, Germany.,Center for Integrated Protein Science Munich (CIPSM) , 81377 Munich, Germany
| |
Collapse
|
29
|
Zhang J, Yang J, Jang R, Zhang Y. GPCR-I-TASSER: A Hybrid Approach to G Protein-Coupled Receptor Structure Modeling and the Application to the Human Genome. Structure 2015; 23:1538-1549. [PMID: 26190572 DOI: 10.1016/j.str.2015.06.007] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Revised: 06/03/2015] [Accepted: 06/10/2015] [Indexed: 12/31/2022]
Abstract
Experimental structure determination remains difficult for G protein-coupled receptors (GPCRs). We propose a new hybrid protocol to construct GPCR structure models that integrates experimental mutagenesis data with ab initio transmembrane (TM) helix assembly simulations. The method was tested on 24 known GPCRs where the ab initio TM-helix assembly procedure constructed the correct fold for 20 cases. When combined with weak homology and sparse mutagenesis restraints, the method generated correct folds for all the tested cases with an average Cα root-mean-square deviation 2.4 Å in the TM regions. The new hybrid protocol was applied to model all 1,026 GPCRs in the human genome, where 923 have a high confidence score and are expected to have correct folds; these contain many pharmaceutically important families with no previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin, and Neuropeptide Y receptors. The results demonstrate new progress on genome-wide structure modeling of TM proteins.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Richard Jang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA.
| |
Collapse
|
30
|
Sun HP, Huang Y, Wang XF, Zhang Y, Shen HB. Improving accuracy of protein contact prediction using balanced network deconvolution. Proteins 2015; 83:485-96. [PMID: 25524593 DOI: 10.1002/prot.24744] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 11/20/2014] [Accepted: 12/02/2014] [Indexed: 12/28/2022]
Abstract
Residue contact map is essential for protein three-dimensional structure determination. But most of the current contact prediction methods based on residue co-evolution suffer from high false-positives as introduced by indirect and transitive contacts (i.e., residues A-B and B-C are in contact, but A-C are not). Built on the work by Feizi et al. (Nat Biotechnol 2013; 31:726-733), which demonstrated a general network model to distinguish direct dependencies by network deconvolution, this study presents a new balanced network deconvolution (BND) algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely used co-evolution methods. On the test of proteins from three benchmark datasets of the 9th critical assessment of protein structure prediction (CASP9), CASP10, and PSICOV (precise structural contact prediction using sparse inverse covariance estimation) database experiments, the BND can improve the medium- and long-range contact predictions at the L/5 cutoff by 55.59% and 47.68%, respectively, without additional central processing unit cost. The improvement is statistically significant, with a P-value < 5.93 × 10(-3) in the Student's t-test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co-evolution-based contact prediction to the three-dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method, which is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/BND/.
Collapse
Affiliation(s)
- Hai-Ping Sun
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China; Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | | | | | | | | |
Collapse
|
31
|
Li G, Theys K, Verheyen J, Pineda-Peña AC, Khouri R, Piampongsant S, Eusébio M, Ramon J, Vandamme AM. A new ensemble coevolution system for detecting HIV-1 protein coevolution. Biol Direct 2015; 10:1. [PMID: 25564011 PMCID: PMC4332441 DOI: 10.1186/s13062-014-0031-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.
Collapse
Affiliation(s)
- Guangdi Li
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Kristof Theys
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Jens Verheyen
- Institute of Virology, University hospital, University Duisburg-Essen, Essen, Germany.
| | - Andrea-Clemencia Pineda-Peña
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Clinical and Molecular Infectious Disease Group, Faculty of Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia.
| | - Ricardo Khouri
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Supinya Piampongsant
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Mónica Eusébio
- Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| | - Jan Ramon
- Department of Computer Science, KU Leuven - University of Leuven, Leuven, Belgium.
| | - Anne-Mieke Vandamme
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| |
Collapse
|
32
|
Leman JK, Ulmschneider MB, Gray JJ. Computational modeling of membrane proteins. Proteins 2015; 83:1-24. [PMID: 25355688 PMCID: PMC4270820 DOI: 10.1002/prot.24703] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Revised: 10/01/2014] [Accepted: 10/18/2014] [Indexed: 02/06/2023]
Abstract
The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1-2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug-specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans-membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α-helical MPs as well as β-barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge-based scoring functions. Moreover, de novo methods have benefited from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Martin B. Ulmschneider
- Department of Materials Science and Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
33
|
Chaudhari R, Heim AJ, Li Z. Improving homology modeling of G-protein coupled receptors through multiple-template derived conserved inter-residue interactions. J Comput Aided Mol Des 2014; 29:413-20. [PMID: 25503850 DOI: 10.1007/s10822-014-9823-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 12/06/2014] [Indexed: 01/19/2023]
Abstract
Evidenced by the three-rounds of G-protein coupled receptors (GPCR) Dock competitions, improving homology modeling methods of helical transmembrane proteins including the GPCRs, based on templates of low sequence identity, remains an eminent challenge. Current approaches addressing this challenge adopt the philosophy of "modeling first, refinement next". In the present work, we developed an alternative modeling approach through the novel application of available multiple templates. First, conserved inter-residue interactions are derived from each additional template through conservation analysis of each template-target pairwise alignment. Then, these interactions are converted into distance restraints and incorporated in the homology modeling process. This approach was applied to modeling of the human β2 adrenergic receptor using the bovin rhodopsin and the human protease-activated receptor 1 as templates and improved model quality was demonstrated compared to the homology model generated by standard single-template and multiple-template methods. This method of "refined restraints first, modeling next", provides a fast and complementary way to the current modeling approaches. It allows rational identification and implementation of additional conserved distance restraints extracted from multiple templates and/or experimental data, and has the potential to be applicable to modeling of all helical transmembrane proteins.
Collapse
Affiliation(s)
- Rajan Chaudhari
- Department of Chemistry & Biochemistry, University of the Sciences in Philadelphia, Box 48, Philadelphia, PA, 19104, USA
| | | | | |
Collapse
|
34
|
Adaptive firefly algorithm: parameter analysis and its application. PLoS One 2014; 9:e112634. [PMID: 25397812 PMCID: PMC4232507 DOI: 10.1371/journal.pone.0112634] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 10/09/2014] [Indexed: 12/02/2022] Open
Abstract
As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm — adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem — protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise.
Collapse
|
35
|
Khadria AS, Mueller BK, Stefely JA, Tan CH, Pagliarini DJ, Senes A. A Gly-zipper motif mediates homodimerization of the transmembrane domain of the mitochondrial kinase ADCK3. J Am Chem Soc 2014; 136:14068-77. [PMID: 25216398 PMCID: PMC4195374 DOI: 10.1021/ja505017f] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Interactions between α-helices within the hydrophobic environment of lipid bilayers are integral to the folding and function of transmembrane proteins; however, the major forces that mediate these interactions remain debated, and our ability to predict these interactions is still largely untested. We recently demonstrated that the frequent transmembrane association motif GASright, the GxxxG-containing fold of the glycophorin A dimer, is optimal for the formation of extended networks of Cα-H hydrogen bonds, supporting the hypothesis that these bonds are major contributors to association. We also found that optimization of Cα-H hydrogen bonding and interhelical packing is sufficient to computationally predict the structure of known GASright dimers at near atomic level. Here, we demonstrate that this computational method can be used to characterize the structure of a protein not previously known to dimerize, by predicting and validating the transmembrane dimer of ADCK3, a mitochondrial kinase. ADCK3 is involved in the biosynthesis of the redox active lipid, ubiquinone, and human ADCK3 mutations cause a cerebellar ataxia associated with ubiquinone deficiency, but the biochemical functions of ADCK3 remain largely undefined. Our experimental analyses show that the transmembrane helix of ADCK3 oligomerizes, with an interface based on an extended Gly-zipper motif, as predicted by our models. The data provide strong evidence for the hypothesis that optimization of Cα-H hydrogen bonding is an important factor in the association of transmembrane helices. This work also provides a structural foundation for investigating the role of transmembrane association in regulating the biological activity of ADCK3.
Collapse
Affiliation(s)
- Ambalika S Khadria
- Department of Biochemistry, University of Wisconsin-Madison , 433 Babcock Drive, Madison, Wisconsin 53706, United States
| | | | | | | | | | | |
Collapse
|
36
|
Hu J, He X, Yu DJ, Yang XB, Yang JY, Shen HB. A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS One 2014; 9:e107676. [PMID: 25229688 PMCID: PMC4168127 DOI: 10.1371/journal.pone.0107676] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 08/09/2014] [Indexed: 12/21/2022] Open
Abstract
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
Collapse
Affiliation(s)
- Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Xue He
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- Changshu Institute, Nanjing University of Science and Technology, Changshu, Jiangsu, China
- * E-mail: (DJY); (HBS)
| | - Xi-Bei Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Jing-Yu Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- * E-mail: (DJY); (HBS)
| |
Collapse
|
37
|
Kufareva I, Katritch V, Stevens RC, Abagyan R. Advances in GPCR modeling evaluated by the GPCR Dock 2013 assessment: meeting new challenges. Structure 2014; 22:1120-1139. [PMID: 25066135 DOI: 10.1016/j.str.2014.06.012] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Revised: 06/05/2014] [Accepted: 06/06/2014] [Indexed: 01/22/2023]
Abstract
Despite tremendous successes of GPCR crystallography, the receptors with available structures represent only a small fraction of human GPCRs. An important role of the modeling community is to maximize structural insights for the remaining receptors and complexes. The community-wide GPCR Dock assessment was established to stimulate and monitor the progress in molecular modeling and ligand docking for GPCRs. The four targets in the present third assessment round presented new and diverse challenges for modelers, including prediction of allosteric ligand interaction and activation states in 5-hydroxytryptamine receptors 1B and 2B, and modeling by extremely distant homology for smoothened receptor. Forty-four modeling groups participated in the assessment. State-of-the-art modeling approaches achieved close-to-experimental accuracy for small rigid orthosteric ligands and models built by close homology, and they correctly predicted protein fold for distant homology targets. Predictions of long loops and GPCR activation states remain unsolved problems.
Collapse
Affiliation(s)
- Irina Kufareva
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, 92039, USA
| | - Vsevolod Katritch
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | | | - Raymond C Stevens
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, 92039, USA.
| |
Collapse
|
38
|
Tessier D, Laroum S, Duval B, Rath EM, Church WB, Hao JK. In silico evaluation of the influence of the translocon on partitioning of membrane segments. BMC Bioinformatics 2014; 15:156. [PMID: 24885988 PMCID: PMC4035737 DOI: 10.1186/1471-2105-15-156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 05/14/2014] [Indexed: 11/10/2022] Open
Abstract
Background The locations of the TM segments inside the membrane proteins are the consequence of a cascade of several events: the localizing of the nascent chain to the membrane, its insertion through the translocon, and the conformation adopted to reach its stable state inside the lipid bilayer. Even though the hydrophobic h-region of signal peptides and a typical TM segment are both composed of mostly hydrophobic side chains, the translocon has the ability to determine whether a given segment is to be inserted into the membrane. Our goal is to acquire robust biological insights into the influence of the translocon on membrane insertion of helices, obtained from the in silico discrimination between signal peptides and transmembrane segments of bitopic proteins. Therefore, by exploiting this subtle difference, we produce an optimized scale that evaluates the tendency of each amino acid to form sequences destined for membrane insertion by the translocon. Results The learning phase of our approach is conducted on carefully chosen data and easily converges on an optimal solution called the PMIscale (Potential Membrane Insertion scale). Our study leads to two striking results. Firstly, with a very simple sliding-window prediction method, PMIscale enables an efficient discrimination between signal peptides and signal anchors. Secondly, PMIscale is also able to identify TM segments and to localize them within protein sequences. Conclusions Despite its simplicity, the localization method based on PMIscale nearly attains the highest level of TM topography prediction accuracy as the current state-of-the-art prediction methods. These observations confirm the prominent role of the translocon in the localization of TM segments and suggest several biological hypotheses about the physical properties of the translocon.
Collapse
Affiliation(s)
- Dominique Tessier
- INRA, UR1268 Biopolymères Interactions et Assemblages, Nantes F-44316, France.
| | | | | | | | | | | |
Collapse
|