1
|
Sun J, Ru J, Cribbs AP, Xiong D. PyPropel: a Python-based tool for efficiently processing and characterising protein data. BMC Bioinformatics 2025; 26:70. [PMID: 40025421 PMCID: PMC11871610 DOI: 10.1186/s12859-025-06079-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/10/2025] [Indexed: 03/04/2025] Open
Abstract
BACKGROUND The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies. RESULTS We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets. CONCLUSION PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK.
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| |
Collapse
|
2
|
Das SC, Tasnim W, Rana HK, Acharjee UK, Islam MM, Khatun R. Comprehensive bioinformatics and machine learning analyses for breast cancer staging using TCGA dataset. Brief Bioinform 2024; 26:bbae628. [PMID: 39656775 DOI: 10.1093/bib/bbae628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 10/23/2024] [Accepted: 11/29/2024] [Indexed: 12/17/2024] Open
Abstract
Breast cancer is an alarming global health concern, including a vast and varied set of illnesses with different molecular characteristics. The fusion of sophisticated computational methodologies with extensive biological datasets has emerged as an effective strategy for unravelling complex patterns in cancer oncology. This research delves into breast cancer staging, classification, and diagnosis by leveraging the comprehensive dataset provided by the The Cancer Genome Atlas (TCGA). By integrating advanced machine learning algorithms with bioinformatics analysis, it introduces a cutting-edge methodology for identifying complex molecular signatures associated with different subtypes and stages of breast cancer. This study utilizes TCGA gene expression data to detect and categorize breast cancer through the application of machine learning and systems biology techniques. Researchers identified differentially expressed genes in breast cancer and analyzed them using signaling pathways, protein-protein interactions, and regulatory networks to uncover potential therapeutic targets. The study also highlights the roles of specific proteins (MYH2, MYL1, MYL2, MYH7) and microRNAs (such as hsa-let-7d-5p) that are the potential biomarkers in cancer progression founded on several analyses. In terms of diagnostic accuracy for cancer staging, the random forest method achieved 97.19%, while the XGBoost algorithm attained 95.23%. Bioinformatics and machine learning meet in this study to find potential biomarkers that influence the progression of breast cancer. The combination of sophisticated analytical methods and extensive genomic datasets presents a promising path for expanding our understanding and enhancing clinical outcomes in identifying and categorizing this intricate illness.
Collapse
Affiliation(s)
- Saurav Chandra Das
- Department of Computer Science and Engineering, Jagannath University, Dhaka-1100, Bangladesh
- Department of Internet of Things and Robotics Engineering, Bangabandhu Sheikh Mujibur Rahman Digital University, Bangladesh, Kaliakair, Gazipur-1750, Bangladesh
| | - Wahia Tasnim
- Department of Computer Science and Engineering, Green University of Bangladesh, Narayanganj-1461, Dhaka, Bangladesh
| | - Humayan Kabir Rana
- Department of Computer Science and Engineering, Green University of Bangladesh, Narayanganj-1461, Dhaka, Bangladesh
| | - Uzzal Kumar Acharjee
- Department of Computer Science and Engineering, Jagannath University, Dhaka-1100, Bangladesh
| | - Md Manowarul Islam
- Department of Computer Science and Engineering, Jagannath University, Dhaka-1100, Bangladesh
| | - Rabea Khatun
- Department of Computer Science and Engineering, Green University of Bangladesh, Narayanganj-1461, Dhaka, Bangladesh
| |
Collapse
|
3
|
Sun J, Kulandaisamy A, Liu J, Hu K, Gromiha MM, Zhang Y. Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications. Comput Struct Biotechnol J 2023; 21:1205-1226. [PMID: 36817959 PMCID: PMC9932300 DOI: 10.1016/j.csbj.2023.01.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 01/16/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Membrane proteins mediate a wide spectrum of biological processes, such as signal transduction and cell communication. Due to the arduous and costly nature inherent to the experimental process, membrane proteins have long been devoid of well-resolved atomic-level tertiary structures and, consequently, the understanding of their functional roles underlying a multitude of life activities has been hampered. Currently, computational tools dedicated to furthering the structure-function understanding are primarily focused on utilizing intelligent algorithms to address a variety of site-wise prediction problems (e.g., topology and interaction sites), but are scattered across different computing sources. Moreover, the recent advent of deep learning techniques has immensely expedited the development of computational tools for membrane protein-related prediction problems. Given the growing number of applications optimized particularly by manifold deep neural networks, we herein provide a review on the current status of computational strategies mainly in membrane protein type classification, topology identification, interaction site detection, and pathogenic effect prediction. Meanwhile, we provide an overview of how the entire prediction process proceeds, including database collection, data pre-processing, feature extraction, and method selection. This review is expected to be useful for developing more extendable computational tools specific to membrane proteins.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Headington, Oxford OX3 7LD, UK
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jacklyn Liu
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Kai Hu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India,Corresponding authors.
| | - Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China,Corresponding authors.
| |
Collapse
|
4
|
Kim D, Ha D, Lee K, Lee H, Kim I, Kim S. An evolution-based machine learning to identify cancer type-specific driver mutations. Brief Bioinform 2023; 24:6961611. [PMID: 36575568 DOI: 10.1093/bib/bbac593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 11/18/2022] [Accepted: 12/03/2022] [Indexed: 12/29/2022] Open
Abstract
Identifying cancer type-specific driver mutations is crucial for illuminating distinct pathologic mechanisms across various tumors and providing opportunities of patient-specific treatment. However, although many computational methods were developed to predict driver mutations in a type-specific manner, the methods still have room to improve. Here, we devise a novel feature based on sequence co-evolution analysis to identify cancer type-specific driver mutations and construct a machine learning (ML) model with state-of-the-art performance. Specifically, relying on 28 000 tumor samples across 66 cancer types, our ML framework outperformed current leading methods of detecting cancer driver mutations. Interestingly, the cancer mutations identified by sequence co-evolution feature are frequently observed in interfaces mediating tissue-specific protein-protein interactions that are known to associate with shaping tissue-specific oncogenesis. Moreover, we provide pre-calculated potential oncogenicity on available human proteins with prediction scores of all possible residue alterations through user-friendly website (http://sbi.postech.ac.kr/w/cancerCE). This work will facilitate the identification of cancer type-specific driver mutations in newly sequenced tumor samples.
Collapse
Affiliation(s)
| | | | | | | | - Inhae Kim
- ImmunoBiome Inc., Pohang, South Korea
| | - Sanguk Kim
- Department of Life Sciences.,Artificial Intelligence Graduate Program, Pohang University of Science and Technology, Pohang 790-784, South Korea.,Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 120-149, South Korea
| |
Collapse
|
5
|
Zhang Y, Grimwood AL, Hancox JC, Harmer SC, Dempsey CE. Evolutionary coupling analysis guides identification of mistrafficking-sensitive variants in cardiac K + channels: Validation with hERG. Front Pharmacol 2022; 13:1010119. [PMID: 36339618 PMCID: PMC9632996 DOI: 10.3389/fphar.2022.1010119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/30/2022] [Indexed: 09/27/2023] Open
Abstract
Loss of function (LOF) mutations of voltage sensitive K+ channel proteins hERG (Kv11.1) and KCNQ1 (Kv7.1) account for the majority of instances of congenital Long QT Syndrome (cLQTS) with the dominant molecular phenotype being a mistrafficking one resulting from protein misfolding. We explored the use of Evolutionary Coupling (EC) analysis, which identifies evolutionarily conserved pairwise amino acid interactions that may contribute to protein structural stability, to identify regions of the channels susceptible to misfolding mutations. Comparison with published experimental trafficking data for hERG and KCNQ1 showed that the method strongly predicts "scaffolding" regions of the channel membrane domains and has useful predictive power for trafficking phenotypes of individual variants. We identified a region in and around the cytoplasmic S2-S3 loop of the hERG Voltage Sensor Domain (VSD) as susceptible to destabilising mutation, and this was confirmed using a quantitative LI-COR ® based trafficking assay that showed severely attenuated trafficking in eight out of 10 natural hERG VSD variants selected using EC analysis. Our analysis highlights an equivalence in the scaffolding structures of the hERG and KCNQ1 membrane domains. Pathogenic variants of ion channels with an underlying mistrafficking phenotype are likely to be located within similar scaffolding structures that are identifiable by EC analysis.
Collapse
Affiliation(s)
- Yihong Zhang
- School of Physiology, Pharmacology and Neuroscience, Biomedical Sciences Building, University Walk, Bristol, United Kingdom
| | - Amy L. Grimwood
- School of Biological Sciences, Life Sciences Building, Bristol, United Kingdom
| | - Jules C. Hancox
- School of Physiology, Pharmacology and Neuroscience, Biomedical Sciences Building, University Walk, Bristol, United Kingdom
| | - Stephen C. Harmer
- School of Physiology, Pharmacology and Neuroscience, Biomedical Sciences Building, University Walk, Bristol, United Kingdom
| | - Christopher E. Dempsey
- School of Biochemistry, Biomedical Sciences Building, University Walk, Bristol, United Kingdom
| |
Collapse
|
6
|
Kim D, Noh MH, Park M, Kim I, Ahn H, Ye DY, Jung GY, Kim S. Enzyme activity engineering based on sequence co-evolution analysis. Metab Eng 2022; 74:49-60. [PMID: 36113751 DOI: 10.1016/j.ymben.2022.09.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/17/2022]
Abstract
The utility of engineering enzyme activity is expanding with the development of biotechnology. Conventional methods have limited applicability as they require high-throughput screening or three-dimensional structures to direct target residues of activity control. An alternative method uses sequence evolution of natural selection. A repertoire of mutations was selected for fine-tuning enzyme activities to adapt to varying environments during the evolution. Here, we devised a strategy called sequence co-evolutionary analysis to control the efficiency of enzyme reactions (SCANEER), which scans the evolution of protein sequences and direct mutation strategy to improve enzyme activity. We hypothesized that amino acid pairs for various enzyme activity were encoded in the evolutionary history of protein sequences, whereas loss-of-function mutations were avoided since those are depleted during the evolution. SCANEER successfully predicted the enzyme activities of beta-lactamase and aminoglycoside 3'-phosphotransferase. SCANEER was further experimentally validated to control the activities of three different enzymes of great interest in chemical production: cis-aconitate decarboxylase, α-ketoglutaric semialdehyde dehydrogenase, and inositol oxygenase. Activity-enhancing mutations that improve substrate-binding affinity or turnover rate were found at sites distal from known active sites or ligand-binding pockets. We provide SCANEER to control desired enzyme activity through a user-friendly webserver.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea
| | - Myung Hyun Noh
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea
| | - Minhyuk Park
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea
| | - Inhae Kim
- ImmunoBiome Inc., Pohang, South Korea
| | - Hyunsoo Ahn
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea
| | - Dae-Yeol Ye
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea
| | - Gyoo Yeol Jung
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea; Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, South Korea; School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, South Korea.
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea; Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea; Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, South Korea; School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, South Korea.
| |
Collapse
|
7
|
Xiao Y, Zeng B, Berner N, Frishman D, Langosch D, George Teese M. Experimental determination and data-driven prediction of homotypic transmembrane domain interfaces. Comput Struct Biotechnol J 2020; 18:3230-3242. [PMID: 33209210 PMCID: PMC7649602 DOI: 10.1016/j.csbj.2020.09.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 09/22/2020] [Accepted: 09/24/2020] [Indexed: 12/22/2022] Open
Abstract
Homotypic TMD interfaces identified by different techniques share strong similarities. The GxxxG motif is the feature most strongly associated with interfaces. Other features include conservation, polarity, coevolution, and depth in the membrane The role of each of each feature strongly depends on the individual protein. Machine-learning helps predict interfaces from evolutionary sequence data
Interactions between their transmembrane domains (TMDs) frequently support the assembly of single-pass membrane proteins to non-covalent complexes. Yet, the TMD-TMD interactome remains largely uncharted. With a view to predicting homotypic TMD-TMD interfaces from primary structure, we performed a systematic analysis of their physical and evolutionary properties. To this end, we generated a dataset of 50 self-interacting TMDs. This dataset contains interfaces of nine TMDs from bitopic human proteins (Ire1, Armcx6, Tie1, ATP1B1, PTPRO, PTPRU, PTPRG, DDR1, and Siglec7) that were experimentally identified here and combined with literature data. We show that interfacial residues of these homotypic TMD-TMD interfaces tend to be more conserved, coevolved and polar than non-interfacial residues. Further, we suggest for the first time that interface positions are deficient in β-branched residues, and likely to be located deep in the hydrophobic core of the membrane. Overrepresentation of the GxxxG motif at interfaces is strong, but that of (small)xxx(small) motifs is weak. The multiplicity of these features and the individual character of TMD-TMD interfaces, as uncovered here, prompted us to train a machine learning algorithm. The resulting prediction method, THOIPA (www.thoipa.org), excels in the prediction of key interface residues from evolutionary sequence data.
Collapse
Affiliation(s)
- Yao Xiao
- Center for Integrated Protein Science Munich (CIPSM) at the Lehrstuhl für Chemie der Biopolymere, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Bo Zeng
- Department of Bioinformatics, Wissenschaftszentrum, Weihenstephan, Maximus-von-Imhof-Forum 3, Freising 85354, Germany
| | - Nicola Berner
- Center for Integrated Protein Science Munich (CIPSM) at the Lehrstuhl für Chemie der Biopolymere, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftszentrum, Weihenstephan, Maximus-von-Imhof-Forum 3, Freising 85354, Germany.,Department of Bioinformatics, Peter the Great Saint Petersburg Polytechnic University, St. Petersburg 195251, Russian Federation
| | - Dieter Langosch
- Center for Integrated Protein Science Munich (CIPSM) at the Lehrstuhl für Chemie der Biopolymere, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Mark George Teese
- Center for Integrated Protein Science Munich (CIPSM) at the Lehrstuhl für Chemie der Biopolymere, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany.,TNG Technology Consulting GmbH, Beta-Straße 13a, 85774 Unterföhring, Germany
| |
Collapse
|
8
|
Sun J, Frishman D. DeepHelicon: Accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks. J Struct Biol 2020; 212:107574. [PMID: 32663598 DOI: 10.1016/j.jsb.2020.107574] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 01/16/2023]
Abstract
Accurate prediction of amino acid residue contacts is an important prerequisite for generating high-quality 3D models of transmembrane (TM) proteins. While a large number of compositional, evolutionary, and structural properties of proteins can be used to train contact prediction methods, recent research suggests that coevolution between residues provides the strongest indication of their spatial proximity. We have developed a deep learning approach, DeepHelicon, to predict inter-helical residue contacts in TM proteins by considering only coevolutionary features. DeepHelicon comprises a two-stage supervised learning process by residual neural networks for a gradual refinement of contact maps, followed by variance reduction by an ensemble of models. We present a benchmark study of 12 contact predictors and conclude that DeepHelicon together with the two other state-of-the-art methods DeepMetaPSICOV and Membrain2 outperforms the 10 remaining algorithms on all datasets and at all settings. On a set of 44 TM proteins with an average length of 388 residues DeepHelicon achieves the best performance among all benchmarked methods in predicting the top L/5 and L/2 inter-helical contacts, with the mean precision of 87.42% and 77.84%, respectively. On a set of 57 relatively small TM proteins with an average length of 298 residues DeepHelicon ranks second best after DeepMetaPSICOV. DeepHelicon produces the most accurate predictions for large proteins with more than 10 transmembrane helices. Coevolutionary features alone allow to predict inter-helical residue contacts with an accuracy sufficient for generating acceptable 3D models for up to 30% of proteins using a fully automated modeling method such as CONFOLD2.
Collapse
Affiliation(s)
- Jianfeng Sun
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany.
| |
Collapse
|
9
|
Fang C, Jia Y, Hu L, Lu Y, Wang H. IMPContact: An Interhelical Residue Contact Prediction Method. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4569037. [PMID: 32309431 PMCID: PMC7140131 DOI: 10.1155/2020/4569037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]
Abstract
As an important category of proteins, alpha-helix transmembrane proteins (αTMPs) play an important role in various biological activities. Because the solved αTMP structures are inadequate, predicting the residue contacts among the transmembrane segments of an αTMP exhibits the basis of protein fold, which can be used to further discover more protein functions. A few efforts have been devoted to predict the interhelical residue contact using machine learning methods based on the prior knowledge of transmembrane protein structure. However, it is still a challenge to improve the prediction accuracy, while the deep learning method provides an opportunity to utilize the structural knowledge in a different insight. For this purpose, we proposed a novel αTMP residue-residue contact prediction method IMPContact, in which a convolutional neural network (CNN) was applied to recognize those interhelical contacts in a TMP using its specific structural features. There were four sequence-based TMP-specific features selected to descript a pair of residues, namely, evolutionary covariation, predicted topology structure, residue relative position, and evolutionary conservation. An up-to-date dataset was used to train and test the IMPContact; our method achieved better performance compared to peer methods. In the case studies, IHRCs in the regular transmembrane helixes were better predicted than in the irregular ones.
Collapse
Affiliation(s)
- Chao Fang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yajie Jia
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| | - Lihong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yinghua Lu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
10
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
11
|
Doñate-Macián P, Crespi-Boixader A, Perálvarez-Marín A. Molecular Evolution Bioinformatics Toward Structural Biology of TRPV1-4 Channels. Methods Mol Biol 2019; 1987:1-21. [PMID: 31028670 DOI: 10.1007/978-1-4939-9446-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Bioinformatics is a very resourceful tool to understand evolution of membrane proteins, such as transient receptor potential channels. Expert bioinformatics users rely on specialized scripting and programming skills. Several web servers and standalone tools are available for nonadvanced users willing to develop projects to understand their system of choice. In this case, we present a desktop-based protocol to develop evostructural hypotheses based on basic bioinformatics skills and resources, specifically for a small subgroup of TRPV channels, which can be further implemented for larger datasets.
Collapse
Affiliation(s)
- Pau Doñate-Macián
- Unitat de Biofísica, Departament de Bioquímica i de Biologia Molecular, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Catalonia, Spain
| | - Alba Crespi-Boixader
- Institute of Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK
| | - Alex Perálvarez-Marín
- Unitat de Biofísica, Departament de Bioquímica i de Biologia Molecular, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Catalonia, Spain.
| |
Collapse
|
12
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
13
|
Cao H, Ng MCK, Jusoh SA, Tai HK, Siu SWI. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers. J Comput Aided Mol Des 2017; 31:855-865. [DOI: 10.1007/s10822-017-0047-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 08/17/2017] [Indexed: 12/01/2022]
|
14
|
Intramolecular allosteric communication in dopamine D2 receptor revealed by evolutionary amino acid covariation. Proc Natl Acad Sci U S A 2016; 113:3539-44. [PMID: 26979958 DOI: 10.1073/pnas.1516579113] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The structural basis of allosteric signaling in G protein-coupled receptors (GPCRs) is important in guiding design of therapeutics and understanding phenotypic consequences of genetic variation. The Evolutionary Trace (ET) algorithm previously proved effective in redesigning receptors to mimic the ligand specificities of functionally distinct homologs. We now expand ET to consider mutual information, with validation in GPCR structure and dopamine D2 receptor (D2R) function. The new algorithm, called ET-MIp, identifies evolutionarily relevant patterns of amino acid covariations. The improved predictions of structural proximity and D2R mutagenesis demonstrate that ET-MIp predicts functional interactions between residue pairs, particularly potency and efficacy of activation by dopamine. Remarkably, although most of the residue pairs chosen for mutagenesis are neither in the binding pocket nor in contact with each other, many exhibited functional interactions, implying at-a-distance coupling. The functional interaction between the coupled pairs correlated best with the evolutionary coupling potential derived from dopamine receptor sequences rather than with broader sets of GPCR sequences. These data suggest that the allosteric communication responsible for dopamine responses is resolved by ET-MIp and best discerned within a short evolutionary distance. Most double mutants restored dopamine response to wild-type levels, also suggesting that tight regulation of the response to dopamine drove the coevolution and intramolecular communications between coupled residues. Our approach provides a general tool to identify evolutionary covariation patterns in small sets of close sequence homologs and to translate them into functional linkages between residues.
Collapse
|
15
|
Hönigschmid P, Frishman D. Accurate prediction of helix interactions and residue contacts in membrane proteins. J Struct Biol 2016; 194:112-23. [PMID: 26851352 DOI: 10.1016/j.jsb.2016.02.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 02/01/2016] [Accepted: 02/02/2016] [Indexed: 11/16/2022]
Abstract
Accurate prediction of intra-molecular interactions from amino acid sequence is an important pre-requisite for obtaining high-quality protein models. Over the recent years, remarkable progress in this area has been achieved through the application of novel co-variation algorithms, which eliminate transitive evolutionary connections between residues. In this work we present a new contact prediction method for α-helical transmembrane proteins, MemConP, in which evolutionary couplings are combined with a machine learning approach. MemConP achieves a substantially improved accuracy (precision: 56.0%, recall: 17.5%, MCC: 0.288) compared to the use of either machine learning or co-evolution methods alone. The method also achieves 91.4% precision, 42.1% recall and a MCC of 0.490 in predicting helix-helix interactions based on predicted contacts. The approach was trained and rigorously benchmarked by cross-validation and independent testing on up-to-date non-redundant datasets of 90 and 30 experimental three dimensional structures, respectively. MemConP is a standalone tool that can be downloaded together with the associated training data from http://webclu.bio.wzw.tum.de/MemConP.
Collapse
Affiliation(s)
- Peter Hönigschmid
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Maximus-von-Imhof Forum 3, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Maximus-von-Imhof Forum 3, 85354 Freising, Germany; Helmholtz Zentrum Munich, German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, 85764 Neuherberg, Germany; Laboratory of Bioinformatics, RASA Research Center, St Petersburg State Polytechnical University, St Petersburg 195251, Russia.
| |
Collapse
|
16
|
Zhang H, Huang Q, Bei Z, Wei Y, Floudas CA. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming. Proteins 2016; 84:332-48. [PMID: 26756402 DOI: 10.1002/prot.24979] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/19/2015] [Accepted: 12/10/2015] [Indexed: 12/28/2022]
Abstract
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/.
Collapse
Affiliation(s)
- Huiling Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingsheng Huang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Zhendong Bei
- Center for Cloud Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yanjie Wei
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Christodoulos A Floudas
- Department of Chemical Engineering, Texas A&M University, College Station, Texas, 77843.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas, 77843
| |
Collapse
|
17
|
Abstract
Transmembrane (TM) helices of integral membrane proteins can facilitate strong and specific noncovalent protein-protein interactions. Mutagenesis and structural analyses have revealed numerous examples in which the interaction between TM helices of single-pass membrane proteins is dependent on a GxxxG or (small)xxx(small) motif. It is therefore tempting to use the presence of these simple motifs as an indicator of TM helix interactions. In this Current Topic review, we point out that these motifs are quite common, with more than 50% of single-pass TM domains containing a (small)xxx(small) motif. However, the actual interaction strength of motif-containing helices depends strongly on sequence context and membrane properties. In addition, recent studies have revealed several GxxxG-containing TM domains that interact via alternative interfaces involving hydrophobic, polar, aromatic, or even ionizable residues that do not form recognizable motifs. In multipass membrane proteins, GxxxG motifs can be important for protein folding, and not just oligomerization. Our current knowledge thus suggests that the presence of a GxxxG motif alone is a weak predictor of protein dimerization in the membrane.
Collapse
Affiliation(s)
- Mark G Teese
- Lehrstuhl für Chemie der Biopolymere, Technische Universität München , 85354 Freising, Germany.,Center for Integrated Protein Science Munich (CIPSM) , 81377 Munich, Germany
| | - Dieter Langosch
- Lehrstuhl für Chemie der Biopolymere, Technische Universität München , 85354 Freising, Germany.,Center for Integrated Protein Science Munich (CIPSM) , 81377 Munich, Germany
| |
Collapse
|
18
|
Suplatov D, Voevodin V, Švedas V. Robust enzyme design: bioinformatic tools for improved protein stability. Biotechnol J 2014; 10:344-55. [PMID: 25524647 DOI: 10.1002/biot.201400150] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 09/30/2014] [Accepted: 11/04/2014] [Indexed: 01/22/2023]
Abstract
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.
Collapse
Affiliation(s)
- Dmitry Suplatov
- Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | | | | |
Collapse
|
19
|
Abstract
Background The importance of mutations in disease phenotype has been studied, with information available in databases such as OMIM. However, it remains a research challenge for the possibility of clustering amino acid residues based on an underlying interaction, such as co-evolution, to understand how mutations in these related sites can lead to different disease phenotypes. Results This paper presents an integrative approach to identify groups of co-evolving residues, known as protein sectors. By studying a protein family using multiple sequence alignments and statistical coupling analysis, we attempted to determine if it is possible that these groups of residues could be related to disease phenotypes. After the protein sectors were identified, disease-associated residues within these groups of amino acids were mapped to a structure representing the protein family. In this study, we used the proposed pipeline to analyze two test cases of spermine synthase and Rab GDP dissociation inhibitor. Conclusions The results suggest that there is a possible link between certain groups of co-evolving residues and different disease phenotypes. The pipeline described in this work could also be used to study other protein families associated with human diseases.
Collapse
|
20
|
Doñate-Macián P, Perálvarez-Marín A. Dissecting domain-specific evolutionary pressure profiles of transient receptor potential vanilloid subfamily members 1 to 4. PLoS One 2014; 9:e110715. [PMID: 25333484 PMCID: PMC4204936 DOI: 10.1371/journal.pone.0110715] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 09/18/2014] [Indexed: 11/30/2022] Open
Abstract
The transient receptor potential vanilloid family includes four ion channels–TRPV1, TRPV2, TRPV3 and TRPV4–that are represented within the vertebrate subphylum and involved in several sensory and physiological processes. These channels are related to adaptation to the environment, and probably under strong evolutionary pressure. Using multiple sequence alignments as source for evolutionary, bioinformatics and statistical analysis, we have analyzed the evolutionary profiles for TRPV1, TRPV2, TRPV3 and TRPV4. The evolutionary pressure exerted over vertebrate TRPV2 sequences compared to the other channels argues for a positive selection profile for TRPV2 compared to TRPV1, TRPV3 and TRPV4. We have analyzed the selective pressure on specific protein domains, observing a common selective pressure trend for the common TRPV scaffold, consisting of the ankyrin repeat domain, the membrane proximal domain, the transmembrane domain, and the TRP domain. Through a more detailed analysis we have identified evolutionary constraints involved in the subunit contact at the transmembrane domain level. Performing evolutionary comparison, we have translated specific channel structural information such as the transmembrane topology, and the interaction between the membrane proximal domain and the TRP box. We have also identified potential common regulatory domains among all TRPV1-4 members, such as protein-protein, lipid-protein and vesicle trafficking domains.
Collapse
Affiliation(s)
- Pau Doñate-Macián
- Unitat de Biofísica, Centre d’Estudis en Biofísica, Departament de Bioquímica i de Biologia Molecular, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Alex Perálvarez-Marín
- Unitat de Biofísica, Centre d’Estudis en Biofísica, Departament de Bioquímica i de Biologia Molecular, Universitat Autònoma de Barcelona, Bellaterra, Spain
- * E-mail:
| |
Collapse
|
21
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
22
|
Li Z, Huang Y, Ouyang Y, Jiao Y, Xing H, Liao L, Jiang S, Shao Y, Ma L. CorMut: an R/Bioconductor package for computing correlated mutations based on selection pressure. Bioinformatics 2014; 30:2073-5. [PMID: 24681904 DOI: 10.1093/bioinformatics/btu154] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
UNLABELLED Correlated mutations constitute a fundamental idea in evolutionary biology, and understanding correlated mutations will, in turn, facilitate an understanding of the genetic mechanisms governing evolution. CorMut is an R package designed to compute correlated mutations in the unit of codon or amino acid mutation. Three classical methods were incorporated, and the computation results can be represented as correlation mutation networks. CorMut also enables the comparison of correlated mutations between two different evolutionary conditions. AVAILABILITY AND IMPLEMENTATION CorMut is released under the GNU General Public License within bioconductor project, and freely available at http://bioconductor.org/packages/release/bioc/html/CorMut.html.
Collapse
Affiliation(s)
- Zhenpeng Li
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yang Huang
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yabo Ouyang
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yang Jiao
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Hui Xing
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Lingjie Liao
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Shibo Jiang
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Yiming Shao
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| | - Liying Ma
- State Key Laboratory for Infectious Disease Prevention and Control, National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing 102206 and Key Laboratory of Medical Molecular Virology (Ministries of Education and Health), Shanghai Medical College and Institute of Medical Microbiology, Fudan University, Shanghai 200032, China
| |
Collapse
|
23
|
Probabilistic grammatical model for helix-helix contact site classification. Algorithms Mol Biol 2013; 8:31. [PMID: 24350601 PMCID: PMC3892132 DOI: 10.1186/1748-7188-8-31] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Accepted: 11/28/2013] [Indexed: 11/25/2022] Open
Abstract
Background Hidden Markov Models power many state‐of‐the‐art tools in
the field of protein bioinformatics. While excelling in their tasks, these
methods of protein analysis do not convey directly information on
medium‐ and long‐range residue‐residue interactions. This
requires an expressive power of at least context‐free grammars.
However, application of more powerful grammar formalisms to protein analysis
has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for
problem‐specific protein languages and apply it to classification of
transmembrane helix‐helix pairs configurations. The core of the model
consists of a probabilistic context‐free grammar, automatically
inferred by a genetic algorithm from only a generic set of
expert‐based rules and positive training samples. The model was
applied to produce sequence based descriptors of four classes of
transmembrane helix‐helix contact site configurations. The highest
performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability
of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for
analysis of protein sequences outperforms the state of the art in the task
of helix‐helix contact site classification. However, this is achieved
without necessarily requiring modeling long range dependencies between
interacting residues. A significant feature of our approach is that grammar
rules and parse trees are human‐readable. Thus they could provide
biologically meaningful information for molecular biologists.
Collapse
|
24
|
Seeliger D. Development of scoring functions for antibody sequence assessment and optimization. PLoS One 2013; 8:e76909. [PMID: 24204701 PMCID: PMC3804498 DOI: 10.1371/journal.pone.0076909] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 08/26/2013] [Indexed: 12/27/2022] Open
Abstract
Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity.
Collapse
Affiliation(s)
- Daniel Seeliger
- Departement of Lead Identification and Optimization Support, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach/Riss, Germany
- * E-mail:
| |
Collapse
|
25
|
Lai JS, Cheng CW, Lo A, Sung TY, Hsu WL. Lipid exposure prediction enhances the inference of rotational angles of transmembrane helices. BMC Bioinformatics 2013; 14:304. [PMID: 24112406 PMCID: PMC3854514 DOI: 10.1186/1471-2105-14-304] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 10/01/2013] [Indexed: 11/12/2022] Open
Abstract
Background Since membrane protein structures are challenging to crystallize, computational approaches are essential for elucidating the sequence-to-structure relationships. Structural modeling of membrane proteins requires a multidimensional approach, and one critical geometric parameter is the rotational angle of transmembrane helices. Rotational angles of transmembrane helices are characterized by their folded structures and could be inferred by the hydrophobic moment; however, the folding mechanism of membrane proteins is not yet fully understood. The rotational angle of a transmembrane helix is related to the exposed surface of a transmembrane helix, since lipid exposure gives the degree of accessibility of each residue in lipid environment. To the best of our knowledge, there have been few advances in investigating whether an environment descriptor of lipid exposure could infer a geometric parameter of rotational angle. Results Here, we present an analysis of the relationship between rotational angles and lipid exposure and a support-vector-machine method, called TMexpo, for predicting both structural features from sequences. First, we observed from the development set of 89 protein chains that the lipid exposure, i.e., the relative accessible surface area (rASA) of residues in the lipid environment, generated from high-resolution protein structures could infer the rotational angles with a mean absolute angular error (MAAE) of 46.32˚. More importantly, the predicted rASA from TMexpo achieved an MAAE of 51.05˚, which is better than 71.47˚ obtained by the best of the compared hydrophobicity scales. Lastly, TMexpo outperformed the compared methods in rASA prediction on the independent test set of 21 protein chains and achieved an overall Matthew’s correlation coefficient, accuracy, sensitivity, specificity, and precision of 0.51, 75.26%, 81.30%, 69.15%, and 72.73%, respectively. TMexpo is publicly available at http://bio-cluster.iis.sinica.edu.tw/TMexpo. Conclusions TMexpo can better predict rASA and rotational angles than the compared methods. When rotational angles can be accurately predicted, free modeling of transmembrane protein structures in turn may benefit from a reduced complexity in ensembles with a significantly less number of packing arrangements. Furthermore, sequence-based prediction of both rotational angle and lipid exposure can provide essential information when high-resolution structures are unavailable and contribute to experimental design to elucidate transmembrane protein functions.
Collapse
Affiliation(s)
- Jhih-Siang Lai
- Institute of Information Science, Academia Sinica, Taipei, Taiwan.
| | | | | | | | | |
Collapse
|
26
|
Yang J, Jang R, Zhang Y, Shen HB. High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling. ACTA ACUST UNITED AC 2013; 29:2579-87. [PMID: 23946502 DOI: 10.1093/bioinformatics/btt440] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
MOTIVATION Residue-residue contacts across the transmembrane helices dictate the three-dimensional topology of alpha-helical membrane proteins. However, contact determination through experiments is difficult because most transmembrane proteins are hard to crystallize. RESULTS We present a novel method (MemBrain) to derive transmembrane inter-helix contacts from amino acid sequences by combining correlated mutations and multiple machine learning classifiers. Tested on 60 non-redundant polytopic proteins using a strict leave-one-out cross-validation protocol, MemBrain achieves an average accuracy of 62%, which is 12.5% higher than the current best method from the literature. When applied to 13 recently solved G protein-coupled receptors, the MemBrain contact predictions helped increase the TM-score of the I-TASSER models by 37% in the transmembrane region. The number of foldable cases (TM-score >0.5) increased by 100%, where all G protein-coupled receptor templates and homologous templates with sequence identity >30% were excluded. These results demonstrate significant progress in contact prediction and a potential for contact-driven structure modeling of transmembrane proteins. AVAILABILITY www.csbio.sjtu.edu.cn/bioinf/MemBrain/
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | |
Collapse
|
27
|
Li X, Zhang Z, Song J. Computational enzyme design approaches with significant biological outcomes: progress and challenges. Comput Struct Biotechnol J 2012; 2:e201209007. [PMID: 24688648 PMCID: PMC3962085 DOI: 10.5936/csbj.201209007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Revised: 09/27/2012] [Accepted: 10/04/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are powerful biocatalysts, however, so far there is still a large gap between the number of enzyme-based practical applications and that of naturally occurring enzymes. Multiple experimental approaches have been applied to generate nearly all possible mutations of target enzymes, allowing the identification of desirable variants with improved properties to meet the practical needs. Meanwhile, an increasing number of computational methods have been developed to assist in the modification of enzymes during the past few decades. With the development of bioinformatic algorithms, computational approaches are now able to provide more precise guidance for enzyme engineering and make it more efficient and less laborious. In this review, we summarize the recent advances of method development with significant biological outcomes to provide important insights into successful computational protein designs. We also discuss the limitations and challenges of existing methods and the future directions that should improve them.
Collapse
Affiliation(s)
- Xiaoman Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China ; Department of Biochemistry and Molecular Biology and ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
28
|
Abstract
Co-evolving positions within protein sequences have been used as spatial constraints to develop a computational approach for modeling membrane protein structures.
Collapse
|
29
|
Wang H, Zhang C, Shi X, Zhang L, Zhou Y. Improving transmembrane protein consensus topology prediction using inter-helical interaction. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2012; 1818:2679-86. [PMID: 22683598 DOI: 10.1016/j.bbamem.2012.05.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Revised: 05/29/2012] [Accepted: 05/31/2012] [Indexed: 11/18/2022]
Abstract
Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html.
Collapse
Affiliation(s)
- Han Wang
- Jilin University, Changchun, China
| | | | | | | | | |
Collapse
|
30
|
Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012; 149:1607-21. [PMID: 22579045 DOI: 10.1016/j.cell.2012.04.012] [Citation(s) in RCA: 389] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Revised: 04/12/2012] [Accepted: 04/23/2012] [Indexed: 01/21/2023]
Abstract
We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.
Collapse
Affiliation(s)
- Thomas A Hopf
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
31
|
Gulyás-Kovács A. Integrated analysis of residue coevolution and protein structure in ABC transporters. PLoS One 2012; 7:e36546. [PMID: 22590562 PMCID: PMC3348156 DOI: 10.1371/journal.pone.0036546] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 04/06/2012] [Indexed: 12/22/2022] Open
Abstract
Intraprotein side chain contacts can couple the evolutionary process of amino acid substitution at one position to that at another. This coupling, known as residue coevolution, may vary in strength. Conserved contacts thus not only define 3-dimensional protein structure, but also indicate which residue-residue interactions are crucial to a protein's function. Therefore, prediction of strongly coevolving residue-pairs helps clarify molecular mechanisms underlying function. Previously, various coevolution detectors have been employed separately to predict these pairs purely from multiple sequence alignments, while disregarding available structural information. This study introduces an integrative framework that improves the accuracy of such predictions, relative to previous approaches, by combining multiple coevolution detectors and incorporating structural contact information. This framework is applied to the ABC-B and ABC-C transporter families, which include the drug exporter P-glycoprotein involved in multidrug resistance of cancer cells, as well as the CFTR chloride channel linked to cystic fibrosis disease. The predicted coevolving pairs are further analyzed based on conformational changes inferred from outward- and inward-facing transporter structures. The analysis suggests that some pairs coevolved to directly regulate conformational changes of the alternating-access transport mechanism, while others to stabilize rigid-body-like components of the protein structure. Moreover, some identified pairs correspond to residues previously implicated in cystic fibrosis.
Collapse
Affiliation(s)
- Attila Gulyás-Kovács
- Laboratory of Cardiac/Membrane Physiology, Rockefeller University, New York, New York, United States of America.
| |
Collapse
|
32
|
Lin X, Hong T, Mu Y, Torres J. Identification of residues involved in water versus glycerol selectivity in aquaporins by differential residue pair co-evolution. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2012; 1818:907-14. [DOI: 10.1016/j.bbamem.2011.12.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 12/15/2011] [Accepted: 12/20/2011] [Indexed: 01/31/2023]
|
33
|
Computational studies of membrane proteins: models and predictions for biological understanding. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2011; 1818:927-41. [PMID: 22051023 DOI: 10.1016/j.bbamem.2011.09.026] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 09/22/2011] [Accepted: 09/26/2011] [Indexed: 01/26/2023]
Abstract
We discuss recent progresses in computational studies of membrane proteins based on physical models with parameters derived from bioinformatics analysis. We describe computational identification of membrane proteins and prediction of their topology from sequence, discovery of sequence and spatial motifs, and implications of these discoveries. The detection of evolutionary signal for understanding the substitution pattern of residues in the TM segments and for sequence alignment is also discussed. We further discuss empirical potential functions for energetics of inserting residues in the TM domain, for interactions between TM helices or strands, and their applications in predicting lipid-facing surfaces of the TM domain. Recent progresses in structure predictions of membrane proteins are also reviewed, with further discussions on calculation of ensemble properties such as melting temperature based on simplified state space model. Additional topics include prediction of oligomerization state of membrane proteins, identification of the interfaces for protein-protein interactions, and design of membrane proteins. This article is part of a Special Issue entitled: Protein Folding in Membranes.
Collapse
|
34
|
Yip KY, Utz L, Sitwell S, Hu X, Sidhu SS, Turk BE, Gerstein M, Kim PM. Identification of specificity determining residues in peptide recognition domains using an information theoretic approach applied to large-scale binding maps. BMC Biol 2011; 9:53. [PMID: 21835011 PMCID: PMC3224579 DOI: 10.1186/1741-7007-9-53] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 08/11/2011] [Indexed: 01/06/2023] Open
Abstract
Background Peptide Recognition Domains (PRDs) are commonly found in signaling proteins. They mediate protein-protein interactions by recognizing and binding short motifs in their ligands. Although a great deal is known about PRDs and their interactions, prediction of PRD specificities remains largely an unsolved problem. Results We present a novel approach to identifying these Specificity Determining Residues (SDRs). Our algorithm generalizes earlier information theoretic approaches to coevolution analysis, to become applicable to this problem. It leverages the growing wealth of binding data between PRDs and large numbers of random peptides, and searches for PRD residues that exhibit strong evolutionary covariation with some positions of the statistical profiles of bound peptides. The calculations involve only information from sequences, and thus can be applied to PRDs without crystal structures. We applied the approach to PDZ, SH3 and kinase domains, and evaluated the results using both residue proximity in co-crystal structures and verified binding specificity maps from mutagenesis studies. Discussion Our predictions were found to be strongly correlated with the physical proximity of residues, demonstrating the ability of our approach to detect physical interactions of the binding partners. Some high-scoring pairs were further confirmed to affect binding specificity using previous experimental results. Combining the covariation results also allowed us to predict binding profiles with higher reliability than two other methods that do not explicitly take residue covariation into account. Conclusions The general applicability of our approach to the three different domain families demonstrated in this paper suggests its potential in predicting binding targets and assisting the exploration of binding mechanisms.
Collapse
Affiliation(s)
- Kevin Y Yip
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Jeon J, Nam HJ, Choi YS, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011; 28:2675-85. [PMID: 21470969 DOI: 10.1093/molbev/msr094] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
An improved understanding of protein conformational changes has broad implications for elucidating the mechanisms of various biological processes and for the design of protein engineering experiments. Understanding rearrangements of residue interactions is a key component in the challenge of describing structural transitions. Evolutionary properties of protein sequences and structures are extensively studied; however, evolution of protein motions, especially with respect to interaction rearrangements, has yet to be explored. Here, we investigated the relationship between sequence evolution and protein conformational changes and discovered that structural transitions are encoded in amino acid sequences as coevolving residue pairs. Furthermore, we found that highly coevolving residues are clustered in the flexible regions of proteins and facilitate structural transitions by forming and disrupting their interactions cooperatively. Our results provide insight into the evolution of protein conformational changes and help to identify residues important for structural transitions.
Collapse
Affiliation(s)
- Jouhyun Jeon
- Division of Molecular and Life Science, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | |
Collapse
|
36
|
Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins 2011; 79:1061-78. [PMID: 21268112 DOI: 10.1002/prot.22934] [Citation(s) in RCA: 219] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2010] [Revised: 10/10/2010] [Accepted: 10/26/2010] [Indexed: 11/09/2022]
Abstract
We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the position-specific conservation statistics and the correlated mutation statistics between sequential and long-range pairs of residues. Existing techniques for learning graphical models from MSA either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e.g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out-performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out-perform Hidden Markov Models in terms of predictive accuracy.
Collapse
Affiliation(s)
- Sivaraman Balakrishnan
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | | | | | | | | |
Collapse
|
37
|
Csanády L, Vergani P, Gulyás-Kovács A, Gadsby DC. Electrophysiological, biochemical, and bioinformatic methods for studying CFTR channel gating and its regulation. Methods Mol Biol 2011; 741:443-69. [PMID: 21594801 DOI: 10.1007/978-1-61779-117-8_28] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
CFTR is the only member of the ABC (ATP-binding cassette) protein superfamily known to function as an ion channel. Most other ABC proteins are ATP-driven transporters, in which a cycle of ATP binding and hydrolysis, at intracellular nucleotide binding domains (NBDs), powers uphill substrate translocation across the membrane. In CFTR, this same ATP-driven cycle opens and closes a transmembrane pore through which chloride ions flow rapidly down their electrochemical gradient. Detailed analysis of the pattern of gating of CFTR channels thus offers the opportunity to learn about mechanisms of function not only of CFTR channels but also of their ABC transporter ancestors. In addition, CFTR channel gating is subject to complex regulation by kinase-mediated phosphorylation at multiple consensus sites in a cytoplasmic regulatory domain that is unique to CFTR. Here we offer a practical guide to extract useful information about the mechanisms that control opening and closing of CFTR channels: on how to plan (including information obtained from analysis of multiple sequence alignments), carry out, and analyze electrophysiological and biochemical experiments, as well as on how to circumvent potential pitfalls.
Collapse
Affiliation(s)
- László Csanády
- Department of Medical Biochemistry, Semmelweis University, Budapest, Hungary.
| | | | | | | |
Collapse
|
38
|
van Dijk ADJ, van Ham RCHJ. Conserved and variable correlated mutations in the plant MADS protein network. BMC Genomics 2010; 11:607. [PMID: 20979667 PMCID: PMC3017862 DOI: 10.1186/1471-2164-11-607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 10/28/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. RESULTS Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. CONCLUSION Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.
Collapse
Affiliation(s)
- Aalt DJ van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
39
|
Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol 2010; 6. [PMID: 20862353 PMCID: PMC2940720 DOI: 10.1371/journal.pcbi.1000923] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 08/09/2010] [Indexed: 11/18/2022] Open
Abstract
Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/. Point mutations (i.e., changes of a single sequence element) can have a severe impact on protein function. Many diseases are caused by such minute defects. On the other hand, the majority of such mutations does not lead to noticeable effects. Although previous research has revealed important aspects that influence or predict the chance of a mutation to cause disease, much remains to be learned before we fully understand this complex problem. In our work, we use the observation that sometimes certain positions in a protein mutate in an apparently correlated fashion and analyze this correlation with respect to mutation vulnerability. Our results show that positions exhibiting evolutionary correlation are significantly more likely to be vulnerable to mutation than average positions. On one hand, our data further support the concept of correlated positions to not only be associated with protein contacts but also functional sites and/or disease positions (as introduced by others). On the other hand, this could be useful to further improve the understanding and prediction of the consequences of mutations. Our work is the first to attempt a large-scale quantitation of this relationship.
Collapse
Affiliation(s)
- Andreas Kowarsch
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Angelika Fuchs
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Philipp Pagel
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
40
|
Fuchs A, Frishman D. Structural comparison and classification of alpha-helical transmembrane domains based on helix interaction patterns. Proteins 2010; 78:2587-99. [PMID: 20552684 DOI: 10.1002/prot.22768] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Structural classification of membrane proteins is still in its infancy due to the relative paucity of available three-dimensional structures compared with soluble proteins. However, recent technological advances in protein structure determination have led to a significant increase in experimentally known membrane protein folds, warranting exploration of the structural universe of membrane proteins. Here, a new and completely membrane protein specific structural classification system is introduced that classifies alpha-helical membrane proteins according to common helix architectures. Each membrane protein is represented by a helix interaction graph depicting transmembrane helices with their pairwise interactions resulting from individual residue contacts. Subsequently, proteins are clustered according to similarities among these helix interaction graphs using a newly developed structural similarity score called HISS. As HISS scores explicitly disregard structural properties of loop regions, they are more suitable to capture conserved transmembrane helix bundle architectures than other structural similarity scores. Importantly, we are able to show that a classification approach based on helix interaction similarity closely resembles conventional structural classification databases such as SCOP and CATH implying that helix interactions are one of the major determinants of alpha-helical membrane protein folds. Furthermore, the classification of all currently available membrane protein structures into 20 recurrent helix architectures and 15 singleton proteins demonstrates not only an impressive variability of membrane helix bundles but also the conservation of common helix interaction patterns among proteins with distinctly different sequences.
Collapse
Affiliation(s)
- Angelika Fuchs
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising 85354, Germany
| | | |
Collapse
|
41
|
Hubert P, Sawma P, Duneau JP, Khao J, Hénin J, Bagnard D, Sturgis J. Single-spanning transmembrane domains in cell growth and cell-cell interactions: More than meets the eye? Cell Adh Migr 2010; 4:313-24. [PMID: 20543559 PMCID: PMC2900628 DOI: 10.4161/cam.4.2.12430] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 05/20/2010] [Indexed: 01/28/2023] Open
Abstract
As a whole, integral membrane proteins represent about one third of sequenced genomes, and more than 50% of currently available drugs target membrane proteins, often cell surface receptors. Some membrane protein classes, with a defined number of transmembrane (TM) helices, are receiving much attention because of their great functional and pharmacological importance, such as G protein-coupled receptors possessing 7 TM segments. Although they represent roughly half of all membrane proteins, bitopic proteins (with only 1 TM helix) have so far been less well characterized. Though they include many essential families of receptors, such as adhesion molecules and receptor tyrosine kinases, many of which are excellent targets for biopharmaceuticals (peptides, antibodies, et al.). A growing body of evidence suggests a major role for interactions between TM domains of these receptors in signaling, through homo and heteromeric associations, conformational changes, assembly of signaling platforms, etc. Significantly, mutations within single domains are frequent in human disease, such as cancer or developmental disorders. This review attempts to give an overview of current knowledge about these interactions, from structural data to therapeutic perspectives, focusing on bitopic proteins involved in cell signaling.
Collapse
Affiliation(s)
- Pierre Hubert
- LISM UPR 9027, CNRS-Aix-Marseille University, Marseille, France.
| | | | | | | | | | | | | |
Collapse
|
42
|
Xu Y, Tillier ERM. Regional covariation and its application for predicting protein contact patches. Proteins 2010; 78:548-58. [PMID: 19768681 DOI: 10.1002/prot.22576] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.
Collapse
Affiliation(s)
- Yongbai Xu
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
43
|
Ashkenazy H, Kliger Y. Reducing phylogenetic bias in correlated mutation analysis. Protein Eng Des Sel 2010; 23:321-6. [PMID: 20067922 DOI: 10.1093/protein/gzp078] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Correlated mutation analysis (CMA) is a sequence-based approach for ab initio protein contact map prediction. The basis of this approach is the observed correlation between mutations in interacting amino acid residues. These correlations are often estimated by either calculating the Pearson's correlation coefficient (PCC) or the mutual information (MI) between columns in a multiple sequence alignment (MSA) of the protein of interest and its homologs. A major challenge of CMA is to filter out the background noise originating from phylogenetic relatedness between sequences included in the MSA. Recently, a procedure to reduce this background noise was demonstrated to improve an MI-based predictor. Herein, we tested whether a similar approach can also improve the performance of the classical PCC-based method. Indeed, performance improvements were achieved for all four major SCOP classes. Furthermore, the results reveal that the improved PCC-based method is superior to MI-based methods for proteins having MSAs of up to 100 sequences.
Collapse
|
44
|
Garceau V, Smith J, Paton IR, Davey M, Fares MA, Sester DP, Burt DW, Hume DA. Pivotal Advance: Avian colony-stimulating factor 1 (CSF-1), interleukin-34 (IL-34), and CSF-1 receptor genes and gene products. J Leukoc Biol 2010; 87:753-64. [PMID: 20051473 DOI: 10.1189/jlb.0909624] [Citation(s) in RCA: 142] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Macrophages are involved in many aspects of development, host defense, pathology, and homeostasis. Their normal differentiation, proliferation, and survival are controlled by CSF-1 via the activation of the CSF1R. A recently discovered cytokine, IL-34, was shown to bind the same receptor in humans. Chicken is a widely used model organism in developmental biology, but the factors that control avian myelopoiesis have not been identified previously. The CSF-1, IL-34, and CSF1R genes in chicken and zebra finch were identified from respective genomic/cDNA sequence resources. Comparative analysis of the avian CSF1R loci revealed likely orthologs of mammalian macrophage-specific promoters and enhancers, and the CSF1R gene is expressed in the developing chick embryo in a pattern consistent with macrophage-specific expression. Chicken CSF-1 and IL-34 were expressed in HEK293 cells and shown to elicit macrophage growth from chicken BM cells in culture. Comparative sequence and co-evolution analysis across all vertebrates suggests that the two ligands interact with distinct regions of the CSF1R. These studies demonstrate that there are two separate ligands for a functional CSF1R across all vertebrates.
Collapse
Affiliation(s)
- Valerie Garceau
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Jiang X, Fares MA. Identifying coevolutionary patterns in human leukocyte antigen (HLA) molecules. Evolution 2009; 64:1429-45. [PMID: 19930454 DOI: 10.1111/j.1558-5646.2009.00903.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The antigenic peptide, major histocompatibility complex molecule (MHC; also called human leukocyte antigen, HLA), coreceptor CD8, or CD4 and T-cell receptor (TCR) function as a complex to initiate effectors' mechanisms of the immune system. The tight functional and physical interaction among these molecules may have involved strong coevolution links among domains within and between proteins. Despite the importance of unraveling such dependencies to understand the arms race of host-pathogen interaction, no previous studies have aimed at achieving such an objective. Here, we perform an exhaustive coevolution analysis and show that indeed such dependencies are strongly shaping the evolution and probably the function of these molecules. We identify intramolecular coevolution in HLA class I and II at domains important for their immune activity. Most of the amino acid sites identified to be coevolving in HLAI have been also detected to undergo positive Darwinian selection highlighting therefore their adaptive value. We also identify coevolution among antigen-binding pockets (P1-P9) and among these and TCR-binding sites. Conversely to HLAI, coevolution is weaker in HLAII. Our results support that such coevolutionary patterns are due to selective pressures of host-pathogen coevolution and cooperative binding of TCRs, antigenic peptides, and CD8/CD4 to HLAI and HLAII.
Collapse
Affiliation(s)
- Xiaowei Jiang
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Ireland.
| | | |
Collapse
|
46
|
Michino M, Brooks CL. Predicting structurally conserved contacts for homologous proteins using sequence conservation filters. Proteins 2009; 77:448-53. [PMID: 19475704 PMCID: PMC2740814 DOI: 10.1002/prot.22456] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The prediction of intramolecular contacts has a useful application in predicting the three-dimensional structures of proteins. The accuracy of the template-based contact prediction methods depends on the quality of the template structures. To reduce the false positive predictions associated with using the entire set of template-derived contacts, we develop selection filters that use sequence conservation information to predict subsets of contacts more likely to be structurally conserved between the template and the target. The method is developed specifically for protein families with few available templates such as the G protein-coupled receptor (GPCR) family. It is validated on a test set of 342 template-target pairs from three protein families, and applied to one template-target pair from the GPCR family. We find that the filter selection method increases the accuracy of contact prediction with sufficient coverage for structure prediction.
Collapse
Affiliation(s)
- Mayako Michino
- Department of Molecular Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd, La Jolla, CA 92037
| | - Charles L. Brooks
- Department of Chemistry and Biophysics Program, University of Michigan, 930 N University Ave, Ann Arbor, MI 48109
| |
Collapse
|
47
|
Jeon J, Yang JS, Kim S. Integration of evolutionary features for the identification of functionally important residues in major facilitator superfamily transporters. PLoS Comput Biol 2009; 5:e1000522. [PMID: 19798434 PMCID: PMC2739438 DOI: 10.1371/journal.pcbi.1000522] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2009] [Accepted: 08/27/2009] [Indexed: 11/18/2022] Open
Abstract
The identification of functionally important residues is an important challenge for understanding the molecular mechanisms of proteins. Membrane protein transporters operate two-state allosteric conformational changes using functionally important cooperative residues that mediate long-range communication from the substrate binding site to the translocation pathway. In this study, we identified functionally important cooperative residues of membrane protein transporters by integrating sequence conservation and co-evolutionary information. A newly derived evolutionary feature, the co-evolutionary coupling number, was introduced to measure the connectivity of co-evolving residue pairs and was integrated with the sequence conservation score. We tested this method on three Major Facilitator Superfamily (MFS) transporters, LacY, GlpT, and EmrD. MFS transporters are an important family of membrane protein transporters, which utilize diverse substrates, catalyze different modes of transport using unique combinations of functional residues, and have enough characterized functional residues to validate the performance of our method. We found that the conserved cores of evolutionarily coupled residues are involved in specific substrate recognition and translocation of MFS transporters. Furthermore, a subset of the residues forms an interaction network connecting functional sites in the protein structure. We also confirmed that our method is effective on other membrane protein transporters. Our results provide insight into the location of functional residues important for the molecular mechanisms of membrane protein transporters. Major Facilitator Superfamily (MFS) transporters are one of the largest families of membrane protein transporters and are ubiquitous to all three kingdoms of life. Structural studies of MFS transporters have revealed that the members of this superfamily share structural homology; however, due to weak sequence similarity, their structural similarity has only been found after structural determination. Even after the structures were solved, painstaking efforts were needed to detect functionally important residues. The identification of functionally important cooperative residues from sequences may provide an alternative way to understanding the function of this important class of proteins. Here, we show that it is possible to identify functionally important residues of MFS transporters by integrating two different evolutionary features, sequence conservation and co-evolutionary information. Our results suggest that the conserved cores of evolutionarily coupled residues are involved in specific substrate recognition and translocation of membrane protein transporters. Also, a subset of the identified residues comprises an interaction network connecting functional sites in the protein structure. The ability to identify functional residues from protein sequences may be helpful for locating potential mutagenesis targets in mechanistic studies of membrane protein transporters.
Collapse
Affiliation(s)
- Jouhyun Jeon
- Division of Molecular and Life Science, Pohang University of Science and Technology, Pohang, Korea
| | | | | |
Collapse
|
48
|
Rose A, Lorenzen S, Goede A, Gruening B, Hildebrand PW. RHYTHM--a server to predict the orientation of transmembrane helices in channels and membrane-coils. Nucleic Acids Res 2009; 37:W575-80. [PMID: 19465378 PMCID: PMC2703963 DOI: 10.1093/nar/gkp418] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
RHYTHM is a web server that predicts buried versus exposed residues of helical membrane proteins. Starting from a given protein sequence, secondary and tertiary structure information is calculated by RHYTHM within only a few seconds. The prediction applies structural information from a growing data base of precalculated packing files and evolutionary information from sequence patterns conserved in a representative dataset of membrane proteins ('Pfam-domains'). The program uses two types of position specific matrices to account for the different geometries of packing in channels and transporters ('channels') or other membrane proteins ('membrane-coils'). The output provides information on the secondary structure and topology of the protein and specifically on the contact type of each residue and its conservation. This information can be downloaded as a graphical file for illustration, a text file for analysis and statistics and a PyMOL file for modeling purposes. The server can be freely accessed at: URL: http://proteinformatics.de/rhythm.
Collapse
Affiliation(s)
- Alexander Rose
- Institute for Medical Physics and Biophysics, Charité, University Medicine Berlin, Ziegelstrasse 5-9, 10098 Berlin, Germany
| | | | | | | | | |
Collapse
|
49
|
Samsonov SA, Teyra J, Anders G, Pisabarro MT. Analysis of the impact of solvent on contacts prediction in proteins. BMC STRUCTURAL BIOLOGY 2009; 9:22. [PMID: 19368710 PMCID: PMC2676287 DOI: 10.1186/1472-6807-9-22] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 04/15/2009] [Indexed: 11/10/2022]
Abstract
Background The correlated mutations concept is based on the assumption that interacting protein residues coevolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. Approaches based on this concept have been widely used for protein contacts prediction since the 90s. Previously, we have shown that water-mediated interactions play an important role in protein interfaces. We have observed that current "dry" correlated mutations approaches might not properly predict certain interactions in protein interfaces due to the fact that they are water-mediated. Results The goal of this study has been to analyze the impact of including solvent into the concept of correlated mutations. For this purpose we use linear combinations of the predictions obtained by the application of two different similarity matrices: a standard "dry" similarity matrix (DRY) and a "wet" similarity matrix (WET) derived from all water-mediated protein interfacial interactions in the PDB. We analyze two datasets containing 50 domains and 10 domain pairs from PFAM and compare the results obtained by using a combination of both matrices. We find that for both intra- and interdomain contacts predictions the introduction of a combination of a "wet" and a "dry" similarity matrix improves the predictions in comparison to the "dry" one alone. Conclusion Our analysis, despite the complexity of its possible general applicability, opens up that the consideration of water may have an impact on the improvement of the contact predictions obtained by correlated mutations approaches.
Collapse
|
50
|
Stacklies W, Vega MC, Wilmanns M, Gräter F. Mechanical network in titin immunoglobulin from force distribution analysis. PLoS Comput Biol 2009; 5:e1000306. [PMID: 19282960 PMCID: PMC2643529 DOI: 10.1371/journal.pcbi.1000306] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Accepted: 01/27/2009] [Indexed: 11/18/2022] Open
Abstract
The role of mechanical force in cellular processes is increasingly revealed by single molecule experiments and simulations of force-induced transitions in proteins. How the applied force propagates within proteins determines their mechanical behavior yet remains largely unknown. We present a new method based on molecular dynamics simulations to disclose the distribution of strain in protein structures, here for the newly determined high-resolution crystal structure of I27, a titin immunoglobulin (IG) domain. We obtain a sparse, spatially connected, and highly anisotropic mechanical network. This allows us to detect load-bearing motifs composed of interstrand hydrogen bonds and hydrophobic core interactions, including parts distal to the site to which force was applied. The role of the force distribution pattern for mechanical stability is tested by in silico unfolding of I27 mutants. We then compare the observed force pattern to the sparse network of coevolved residues found in this family. We find a remarkable overlap, suggesting the force distribution to reflect constraints for the evolutionary design of mechanical resistance in the IG family. The force distribution analysis provides a molecular interpretation of coevolution and opens the road to the study of the mechanism of signal propagation in proteins in general.
Collapse
Affiliation(s)
- Wolfram Stacklies
- CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai, People's Republic of China
| | - M. Cristina Vega
- Institut de Biologia Molecular de Barcelona (IBMB-CSIC) and Institute for Research in Biomedicine (IRB), Barcelona, Spain
| | | | - Frauke Gräter
- CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai, People's Republic of China
- Bioquant BQ0031, Universität Heidelberg, Heidelberg, Germany
- Max-Planck Institute for Metals Research, Stuttgart, Germany
- * E-mail:
| |
Collapse
|