1
|
Redl I, Fisicaro C, Dutton O, Hoffmann F, Henderson L, Owens BJ, Heberling M, Paci E, Tamiola K. ADOPT: intrinsic protein disorder prediction through deep bidirectional transformers. NAR Genom Bioinform 2023; 5:lqad041. [PMID: 37138579 PMCID: PMC10150328 DOI: 10.1093/nargab/lqad041] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 02/07/2023] [Accepted: 04/17/2023] [Indexed: 05/05/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) are important for a broad range of biological functions and are involved in many diseases. An understanding of intrinsic disorder is key to develop compounds that target IDPs. Experimental characterization of IDPs is hindered by the very fact that they are highly dynamic. Computational methods that predict disorder from the amino acid sequence have been proposed. Here, we present ADOPT (Attention DisOrder PredicTor), a new predictor of protein disorder. ADOPT is composed of a self-supervised encoder and a supervised disorder predictor. The former is based on a deep bidirectional transformer, which extracts dense residue-level representations from Facebook's Evolutionary Scale Modeling library. The latter uses a database of nuclear magnetic resonance chemical shifts, constructed to ensure balanced amounts of disordered and ordered residues, as a training and a test dataset for protein disorder. ADOPT predicts whether a protein or a specific region is disordered with better performance than the best existing predictors and faster than most other proposed methods (a few seconds per sequence). We identify the features that are relevant for the prediction performance and show that good performance can already be gained with <100 features. ADOPT is available as a stand-alone package at https://github.com/PeptoneLtd/ADOPT and as a web server at https://adopt.peptone.io/.
Collapse
Affiliation(s)
- Istvan Redl
- Peptone Ltd, 370 Grays Inn Road, London WC1X 8BB, UK
| | | | - Oliver Dutton
- Peptone Ltd, 370 Grays Inn Road, London WC1X 8BB, UK
| | - Falk Hoffmann
- Peptone Ltd, 370 Grays Inn Road, London WC1X 8BB, UK
| | | | | | | | - Emanuele Paci
- Peptone Ltd, 370 Grays Inn Road, London WC1X 8BB, UK
- Department of Physics and Astronomy ‘Augusto Righi’, University of Bologna, 40127 Bologna, Italy
| | - Kamil Tamiola
- To whom correspondence should be addressed. Tel: +41 79 609 7333;
| |
Collapse
|
2
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
3
|
Zhao B, Kurgan L. Deep learning in prediction of intrinsic disorder in proteins. Comput Struct Biotechnol J 2022; 20:1286-1294. [PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder prediction is an active area that has developed over 100 predictors. We identify and investigate a recent trend towards the development of deep neural network (DNN)-based methods. The first DNN-based method was released in 2013 and since 2019 deep learners account for majority of the new disorder predictors. We find that the 13 currently available DNN-based predictors are diverse in their topologies, sizes of their networks and the inputs that they utilize. We empirically show that the deep learners are statistically more accurate than other types of disorder predictors using the blind test dataset from the recent community assessment of intrinsic disorder predictions (CAID). We also identify several well-rounded DNN-based predictors that are accurate, fast and/or conveniently available. The popularity, favorable predictive performance and architectural flexibility suggest that deep networks are likely to fuel the development of future disordered predictors. Novel hybrid designs of deep networks could be used to adequately accommodate for diversity of types and flavors of intrinsic disorder. We also discuss scarcity of the DNN-based methods for the prediction of disordered binding regions and the need to develop more accurate methods for this prediction.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
4
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
5
|
Using a low correlation high orthogonality feature set and machine learning methods to identify plant pentatricopeptide repeat coding gene/protein. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.02.079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
6
|
Zimmermann MT, Williams MM, Klee EW, Lomberk GA, Urrutia R. Modeling post-translational modifications and cancer-associated mutations that impact the heterochromatin protein 1α-importin α heterodimers. Proteins 2019; 87:904-916. [PMID: 31152607 PMCID: PMC6790107 DOI: 10.1002/prot.25752] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 05/27/2019] [Indexed: 12/27/2022]
Abstract
Heterochromatin protein 1α (HP1α) is a protein that mediates cancer-associated processes in the cell nucleus. Proteomic experiments, reported here, demonstrate that HP1α complexes with importin α (IMPα), a protein necessary for its nuclear transport. This data is congruent with Simple Linear Motif (SLiM) analyses that identify an IMPα-binding motif within the linker that joins the two globular domains of this protein. Using molecular modeling and dynamics simulations, we develop a model of the IMPα-HP1α complex and investigate the impact of phosphorylation and genomic variants on their interaction. We demonstrate that phosphorylation of the HP1α linker likely regulates its association with IMPα, which has implications for HP1α access to the nucleus, where it functions. Cancer-associated genomic variants do not abolish the interaction of HP1α but instead lead to rearrangements where the variant proteins maintain interaction with IMPα, but with less specificity. Combined, this new mechanistic insight bears biochemical, cell biological, and biomedical relevance.
Collapse
Affiliation(s)
- Michael T. Zimmermann
- Bioinformatics Research and Development Laboratory, and Precision Medicine Simulation Unit, Genomic Science and Precision Medicine Center (GSPMC)Medical College of WisconsinMilwaukeeWisconsin
- Clinical and Translational Sciences InstituteMedical College of WisconsinMilwaukeeWisconsin
| | - Monique M. Williams
- Department of BiochemistryMayo ClinicRochesterMinnesota
- Division of Biomedical Statistics and InformaticsMayo ClinicRochesterMinnesota
| | - Eric W. Klee
- Department of BiochemistryMayo ClinicRochesterMinnesota
- Division of Biomedical Statistics and InformaticsMayo ClinicRochesterMinnesota
| | - Gwen A. Lomberk
- Division of Research, Department of SurgeryMedical College of WisconsinMilwaukeeWisconsin
- Department of Pharmacology and ToxicologyMedical College of WisconsinMilwaukeeWisconsin
- Genomic Science and Precision Medicine Center (GSPMC)Medical College of WisconsinMilwaukeeWisconsin
| | - Raul Urrutia
- Division of Research, Department of SurgeryMedical College of WisconsinMilwaukeeWisconsin
- Genomic Science and Precision Medicine Center (GSPMC)Medical College of WisconsinMilwaukeeWisconsin
- Department of BiochemistryMedical College of WisconsinMilwaukeeWisconsin
| |
Collapse
|
7
|
Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019; 20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open
Abstract
Intrinsically disordered proteins and regions are widely distributed in proteins, which are associated with many biological processes and diseases. Accurate prediction of intrinsically disordered proteins and regions is critical for both basic research (such as protein structure and function prediction) and practical applications (such as drug development). During the past decades, many computational approaches have been proposed, which have greatly facilitated the development of this important field. Therefore, a comprehensive and updated review is highly required. In this regard, we give a review on the computational methods for intrinsically disordered protein and region prediction, especially focusing on the recent development in this field. These computational approaches are divided into four categories based on their methodologies, including physicochemical-based method, machine-learning-based method, template-based method and meta method. Furthermore, their advantages and disadvantages are also discussed. The performance of 40 state-of-the-art predictors is directly compared on the target proteins in the task of disordered region prediction in the 10th Critical Assessment of protein Structure Prediction. A more comprehensive performance comparison of 45 different predictors is conducted based on seven widely used benchmark data sets. Finally, some open problems and perspectives are discussed.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| |
Collapse
|
8
|
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]
Abstract
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
Collapse
|
9
|
Oldfield CJ, Uversky VN, Dunker AK, Kurgan L. Introduction to intrinsically disordered proteins and regions. Proteins 2019. [DOI: 10.1016/b978-0-12-816348-1.00001-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
10
|
Liu Y, Wang X, Liu B. IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields. Int J Mol Sci 2018; 19:E2483. [PMID: 30135358 PMCID: PMC6164615 DOI: 10.3390/ijms19092483] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/14/2018] [Accepted: 08/18/2018] [Indexed: 12/16/2022] Open
Abstract
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP⁻CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP⁻CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP⁻CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP⁻CRF will facilitate the development of protein sequence analysis.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| |
Collapse
|
11
|
Tarafder S, Toukir Ahmed M, Iqbal S, Tamjidul Hoque M, Sohel Rahman M. RBSURFpred: Modeling protein accessible surface area in real and binary space using regularized and optimized regression. J Theor Biol 2018; 441:44-57. [PMID: 29305182 DOI: 10.1016/j.jtbi.2017.12.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 12/11/2017] [Accepted: 12/28/2017] [Indexed: 01/04/2023]
Abstract
Accessible surface area (ASA) of a protein residue is an effective feature for protein structure prediction, binding region identification, fold recognition problems etc. Improving the prediction of ASA by the application of effective feature variables is a challenging but explorable task to consider, specially in the field of machine learning. Among the existing predictors of ASA, REGAd3p is a highly accurate ASA predictor which is based on regularized exact regression with polynomial kernel of degree 3. In this work, we present a new predictor RBSURFpred, which extends REGAd3p on several dimensions by incorporating 58 physicochemical, evolutionary and structural properties into 9-tuple peptides via Chou's general PseAAC, which allowed us to obtain higher accuracies in predicting both real-valued and binary ASA. We have compared RBSURFpred for both real and binary space predictions with state-of-the-art predictors, such as REGAd3p and SPIDER2. We also have carried out a rigorous analysis of the performance of RBSURFpred in terms of different amino acids and their properties, and also with biologically relevant case-studies. The performance of RBSURFpred establishes itself as a useful tool for the community.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh
| | - Md Toukir Ahmed
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh
| | - Sumaiya Iqbal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | | | - M Sohel Rahman
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| |
Collapse
|
12
|
Wu H, Wang K, Lu L, Xue Y, Lyu Q, Jiang M. Deep Conditional Random Field Approach to Transmembrane Topology Prediction and Application to GPCR Three-Dimensional Structure Modeling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1106-1114. [PMID: 27576262 DOI: 10.1109/tcbb.2016.2602872] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Transmembrane proteins play important roles in cellular energy production, signal transmission, and metabolism. Many shallow machine learning methods have been applied to transmembrane topology prediction, but the performance was limited by the large size of membrane proteins and the complex biological evolution information behind the sequence. In this paper, we proposed a novel deep approach based on conditional random fields named as dCRF-TM for predicting the topology of transmembrane proteins. Conditional random fields take into account more complicated interrelation between residue labels in full-length sequence than HMM and SVM-based methods. Three widely-used datasets were employed in the benchmark. DCRF-TM had the accuracy 95 percent over helix location prediction and the accuracy 78 percent over helix number prediction. DCRF-TM demonstrated a more robust performance on large size proteins (>350 residues) against 11 state-of-the-art predictors. Further dCRF-TM was applied to ab initio modeling three-dimensional structures of seven-transmembrane receptors, also known as G protein-coupled receptors. The predictions on 24 solved G protein-coupled receptors and unsolved vasopressin V2 receptor illustrated that dCRF-TM helped abGPCR-I-TASSER to improve TM-score 34.3 percent rather than using the random transmembrane definition. Two out of five predicted models caught the experimental verified disulfide bonds in vasopressin V2 receptor.
Collapse
|
13
|
Meng F, Uversky VN, Kurgan L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci 2017; 74:3069-3090. [PMID: 28589442 PMCID: PMC11107660 DOI: 10.1007/s00018-017-2555-4] [Citation(s) in RCA: 130] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 06/01/2017] [Indexed: 12/19/2022]
Abstract
Computational prediction of intrinsic disorder in protein sequences dates back to late 1970 and has flourished in the last two decades. We provide a brief historical overview, and we review over 30 recent predictors of disorder. We are the first to also cover predictors of molecular functions of disorder, including 13 methods that focus on disordered linkers and disordered protein-protein, protein-RNA, and protein-DNA binding regions. We overview their predictive models, usability, and predictive performance. We highlight newest methods and predictors that offer strong predictive performance measured based on recent comparative assessments. We conclude that the modern predictors are relatively accurate, enjoy widespread use, and many of them are fast. Their predictions are conveniently accessible to the end users, via web servers and databases that store pre-computed predictions for millions of proteins. However, research into methods that predict many not yet addressed functions of intrinsic disorder remains an outstanding challenge.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, USA.
| |
Collapse
|
14
|
Wang S, Ma J, Xu J. AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2017; 32:i672-i679. [PMID: 27587688 DOI: 10.1093/bioinformatics/btw446] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. METHOD This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data. RESULTS Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others. AVAILABILITY AND IMPLEMENTATION http://raptorx2.uchicago.edu/StructurePropertyPred/predict/ CONTACT wangsheng@uchicago.edu, jinboxu@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
15
|
Wu W, Wang Z, Cong P, Li T. Accurate prediction of protein relative solvent accessibility using a balanced model. BioData Min 2017; 10:1. [PMID: 28127402 PMCID: PMC5259893 DOI: 10.1186/s13040-016-0121-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 12/27/2016] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Protein relative solvent accessibility provides insight into understanding protein structure and function. Prediction of protein relative solvent accessibility is often the first stage of predicting other protein properties. Recent predictors of relative solvent accessibility discriminate against exposed regions as compared with buried regions, resulting in higher prediction accuracy associated with buried regions relative to exposed regions. METHODS Here, we propose a more accurate and balanced predictor of protein relative solvent accessibility. First, we collected known proteins in three subsets according to sequence length and constructed a balanced dataset after reducing redundancy within each subset. Next, we measured the performance associated with different variables and variable combinations to determine the best variable combination. Finally, a predictor called BMRSA was constructed for modelling and prediction, which used the balanced set as the training set, the position- specific scoring matrix, predicted secondary structure, buried-exposed profile, and length of a query sequence as variables, and the conditional random field as the machine-learning method. RESULTS BMRSA performance on test sets confirmed that our approach improved prediction accuracy relative to state-of-the-art approaches and was balanced in its comparison of buried and exposed regions. Our method is valuable when higher levels of accuracy in predicting exposed-residue states are required. The BMRSA is available at: http://cheminfo.tongji.edu.cn:8080/BMRSA/.
Collapse
Affiliation(s)
- Wei Wu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
| |
Collapse
|
16
|
Abstract
Over the past decade, it has become evident that a large proportion of proteins contain intrinsically disordered regions, which play important roles in pivotal cellular functions. Many computational tools have been developed with the aim of identifying the level and location of disorder within a protein. In this chapter, we describe a neural network based technique called SPINE-D that employs a unique three-state design and can accurately capture disordered residues in both short and long disordered regions. SPINE-D was trained on a large database of 4229 non-redundant proteins, and yielded an AUC of 0.86 on a cross-validation test and 0.89 on an independent test. SPINE-D can also detect a semi-disordered state that is associated with induced folders and aggregation-prone regions in disordered proteins and weakly stable or locally unfolded regions in structured proteins. We implement an online web service and an offline stand-alone program for SPINE-D, they are freely available at http://sparks-lab.org/SPINE-D/ . We then walk you through how to use the online and offline SPINE-D in making disorder predictions, and examine the disorder and semi-disorder prediction in a case study on the p53 protein.
Collapse
Affiliation(s)
- Tuo Zhang
- Department of Microbiology and Immunology, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46032, USA
- Research and Information Systems, LLC, Indianapolis, IN, USA
| | - Zhixiu Li
- Translational Genomics Group, Institute of Health and Biomedical Innovation, Queensland University of Technology at Translational Research Institute, 37 Kent Street, Woolloongabba, QLD, 4102, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Science 1 (G24) 2.10, Parklands Drive, Southport, QLD, 4222, Australia.
| |
Collapse
|
17
|
Lieutaud P, Ferron F, Uversky AV, Kurgan L, Uversky VN, Longhi S. How disordered is my protein and what is its disorder for? A guide through the "dark side" of the protein universe. INTRINSICALLY DISORDERED PROTEINS 2016; 4:e1259708. [PMID: 28232901 DOI: 10.1080/21690707.2016.1259708] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Revised: 11/03/2016] [Accepted: 11/04/2016] [Indexed: 12/18/2022]
Abstract
In the last 2 decades it has become increasingly evident that a large number of proteins are either fully or partially disordered. Intrinsically disordered proteins lack a stable 3D structure, are ubiquitous and fulfill essential biological functions. Their conformational heterogeneity is encoded in their amino acid sequences, thereby allowing intrinsically disordered proteins or regions to be recognized based on properties of these sequences. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to structural determination with X-ray crystallization. This article discusses a comprehensive selection of databases and methods currently employed to disseminate experimental and putative annotations of disorder, predict disorder and identify regions involved in induced folding. It also provides a set of detailed instructions that should be followed to perform computational analysis of disorder.
Collapse
Affiliation(s)
- Philippe Lieutaud
- Aix-Marseille Université, AFMB UMR, Marseille, France; CNRS, AFMB UMR, Marseille, France
| | - François Ferron
- Aix-Marseille Université, AFMB UMR, Marseille, France; CNRS, AFMB UMR, Marseille, France
| | - Alexey V Uversky
- Center for Data Analytics and Biomedical Informatics, Department of Computer and Information Sciences, College of Science and Technology, Temple University , Philadelphia, PA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University , Richmond, VA, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA; Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia
| | - Sonia Longhi
- Aix-Marseille Université, AFMB UMR, Marseille, France; CNRS, AFMB UMR, Marseille, France
| |
Collapse
|
18
|
Postel S, Deredge D, Bonsor DA, Yu X, Diederichs K, Helmsing S, Vromen A, Friedler A, Hust M, Egelman EH, Beckett D, Wintrode PL, Sundberg EJ. Bacterial flagellar capping proteins adopt diverse oligomeric states. eLife 2016; 5. [PMID: 27664419 PMCID: PMC5072837 DOI: 10.7554/elife.18857] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 09/23/2016] [Indexed: 11/13/2022] Open
Abstract
Flagella are crucial for bacterial motility and pathogenesis. The flagellar capping protein (FliD) regulates filament assembly by chaperoning and sorting flagellin (FliC) proteins after they traverse the hollow filament and exit the growing flagellum tip. In the absence of FliD, flagella are not formed, resulting in impaired motility and infectivity. Here, we report the 2.2 Å resolution X-ray crystal structure of FliD from Pseudomonas aeruginosa, the first high-resolution structure of any FliD protein from any bacterium. Using this evidence in combination with a multitude of biophysical and functional analyses, we find that Pseudomonas FliD exhibits unexpected structural similarity to other flagellar proteins at the domain level, adopts a unique hexameric oligomeric state, and depends on flexible determinants for oligomerization. Considering that the flagellin filaments on which FliD oligomers are affixed vary in protofilament number between bacteria, our results suggest that FliD oligomer stoichiometries vary across bacteria to complement their filament assemblies. DOI:http://dx.doi.org/10.7554/eLife.18857.001 Many bacteria, including several that cause diseases in people, have long whip-like appendages called flagella that extend well beyond their cell walls. Flagella can rotate and propel the bacteria through liquids, such as water or blood, and they are constructed primarily from thousands of copies of a single protein called flagellin. When flagella are built, the flagellin proteins are placed in their proper positions by another protein called FliD, several copies of which form a cap on the end of flagella. Without FliD, bacteria cannot properly assemble flagella and, thus, can no longer swim; this also hinders their ability to cause disease. Determining the three-dimensional structure of a protein, down to the level of its individual atoms, can provide unique insights into how the protein operates. However, no one had resolved the structure of a FliD protein from any bacterium to this level of detail before. Now, Postel et al. report the high-resolution structure of a large fragment of FliD from the bacterium Pseudomonas aeruginosa. The structure reveals that parts of this FliD protein are shaped like parts of other proteins from which flagella are constructed, including the flagellin protein that FliD places into position. Some parts of the FliD protein are also very flexible and these parts of the protein are responsible for holding numerous FliD proteins together as a cap. Finally, Postel et al. saw that six copies of FliD bind to one another to form a protein complex on the end of flagella. This last finding was particularly unexpected since it was thought that all FliD proteins formed five-membered cap complexes, an assumption that was based largely on studies of FliD from another bacterium called Salmonella. The current structure covers about half of the FliD protein, and so the next challenge is to determine the structure of the full-length protein. An improved understanding of the structure of FliD may, in future, help researchers to design drugs that stop bacteria from building flagella and, therefore, from swimming and causing disease. DOI:http://dx.doi.org/10.7554/eLife.18857.002
Collapse
Affiliation(s)
- Sandra Postel
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, United States
| | - Daniel Deredge
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, United States
| | - Daniel A Bonsor
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, United States
| | - Xiong Yu
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, United States
| | - Kay Diederichs
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Saskia Helmsing
- Department of Biotechnology, Institute of Biochemistry, Biotechnology and Bioinformatics, Technische Universität Braunschweig, Braunschweig, Germany
| | - Aviv Vromen
- Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Assaf Friedler
- Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michael Hust
- Department of Biotechnology, Institute of Biochemistry, Biotechnology and Bioinformatics, Technische Universität Braunschweig, Braunschweig, Germany
| | - Edward H Egelman
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, United States
| | - Dorothy Beckett
- Department of Chemistry and Biochemistry, University of Maryland College Park, Baltimore, United States
| | - Patrick L Wintrode
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, United States
| | - Eric J Sundberg
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, United States.,Department of Medicine, University of Maryland School of Medicine, Baltimore, United States.,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, United States
| |
Collapse
|
19
|
Abstract
In the last two decades, it has become increasingly evident that a large number of proteins are either fully or partially disordered. Intrinsically disordered proteins are ubiquitous proteins that fulfill essential biological functions while lacking a stable 3D structure. Their conformational heterogeneity is encoded at the amino acid sequence level, thereby allowing intrinsically disordered proteins or regions to be recognized based on their sequence properties. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting disorder and identifying regions involved in induced folding.
Collapse
Affiliation(s)
- Philippe Lieutaud
- AFMB UMR 7257, Aix-Marseille Université, 163, avenue de Luminy, Case 932, 13288, Marseille Cedex 09, France
- AFMB UMR 7257, CNRS, 163, avenue de Luminy, Case 932, 13288, Marseille Cedex 09, France
| | - François Ferron
- AFMB UMR 7257, Aix-Marseille Université, 163, avenue de Luminy, Case 932, 13288, Marseille Cedex 09, France
- AFMB UMR 7257, CNRS, 163, avenue de Luminy, Case 932, 13288, Marseille Cedex 09, France
| | - Sonia Longhi
- AFMB UMR 7257, Aix-Marseille Université, 163, avenue de Luminy, Case 932, 13288, Marseille Cedex 09, France.
- AFMB UMR 7257, CNRS, 163, avenue de Luminy, Case 932, 13288, Marseille Cedex 09, France.
| |
Collapse
|
20
|
Velez G, Lin M, Christensen T, Faubion WA, Lomberk G, Urrutia R. Evidence supporting a critical contribution of intrinsically disordered regions to the biochemical behavior of full-length human HP1γ. J Mol Model 2015; 22:12. [PMID: 26680990 PMCID: PMC4683166 DOI: 10.1007/s00894-015-2874-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2015] [Accepted: 11/22/2015] [Indexed: 12/16/2022]
Abstract
HP1γ, a non-histone chromatin protein, has elicited significant attention because of its role in gene silencing, elongation, splicing, DNA repair, cell growth, differentiation, and many other cancer-associated processes, including therapy resistance. These characteristics make it an ideal target for developing small drugs for both mechanistic experimentation and potential therapies. While high-resolution structures of the two globular regions of HP1γ, the chromo- and chromoshadow domains, have been solved, little is currently known about the conformational behavior of the full-length protein. Consequently, in the current study, we use threading, homology-based molecular modeling, molecular mechanics calculations, and molecular dynamics simulations to develop models that allow us to infer properties of full-length HP1γ at an atomic resolution level. HP1γ appears as an elongated molecule in which three Intrinsically Disordered Regions (IDRs, 1, 2, and 3) endow this protein with dynamic flexibility, intermolecular recognition properties, and the ability to integrate signals from various intracellular pathways. Our modeling also suggests that the dynamic flexibility imparted to HP1γ by the three IDRs is important for linking nucleosomes with PXVXL motif-containing proteins, in a chromatin environment. The importance of the IDRs in intermolecular recognition is illustrated by the building and study of both IDR2 HP1γ−importin-α and IDR1 and IDR2 HP1γ−DNA complexes. The ability of the three IDRs for integrating cell signals is demonstrated by combined linear motif analyses and molecular dynamics simulations showing that posttranslational modifications can generate a histone mimetic sequence within the IDR2 of HP1γ, which when bound by the chromodomain can lead to an autoinhibited state. Combined, these data underscore the importance of IDRs 1, 2, and 3 in defining the structural and dynamic properties of HP1γ, discoveries that have both mechanistic and potentially biomedical relevance.
Collapse
Affiliation(s)
- Gabriel Velez
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA.,Medical Scientist Training Program, University of Iowa, Iowa City, IA, USA
| | - Marisa Lin
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Trace Christensen
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - William A Faubion
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Gwen Lomberk
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA.
| | - Raul Urrutia
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
21
|
DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel. PLoS One 2015; 10:e0141551. [PMID: 26517719 PMCID: PMC4627842 DOI: 10.1371/journal.pone.0141551] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Accepted: 10/09/2015] [Indexed: 12/02/2022] Open
Abstract
Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0.
Collapse
|
22
|
Cronin NB, Yang J, Zhang Z, Kulkarni K, Chang L, Yamano H, Barford D. Atomic-Resolution Structures of the APC/C Subunits Apc4 and the Apc5 N-Terminal Domain. J Mol Biol 2015; 427:3300-3315. [PMID: 26343760 PMCID: PMC4590430 DOI: 10.1016/j.jmb.2015.08.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 08/19/2015] [Accepted: 08/26/2015] [Indexed: 10/25/2022]
Abstract
Many essential biological processes are mediated by complex molecular machines comprising multiple subunits. Knowledge on the architecture of individual subunits and their positions within the overall multimeric complex is key to understanding the molecular mechanisms of macromolecular assemblies. The anaphase-promoting complex/cyclosome (APC/C) is a large multisubunit complex that regulates cell cycle progression by ubiquitinating cell cycle proteins for proteolysis by the proteasome. The holo-complex is composed of 15 different proteins that assemble to generate a complex of 20 subunits. Here, we describe the crystal structures of Apc4 and the N-terminal domain of Apc5 (Apc5(N)). Apc4 comprises a WD40 domain split by a long α-helical domain, whereas Apc5(N) has an α-helical fold. In a separate study, we had fitted these atomic models to a 3.6-Å-resolution cryo-electron microscopy map of the APC/C. We describe how, in the context of the APC/C, regions of Apc4 disordered in the crystal assume order through contacts to Apc5, whereas Apc5(N) shows small conformational changes relative to its crystal structure. We discuss the complementary approaches of high-resolution electron microscopy and protein crystallography to the structure determination of subunits of multimeric complexes.
Collapse
Affiliation(s)
- Nora B Cronin
- Division of Structural Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, United Kingdom
| | - Jing Yang
- Division of Structural Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, United Kingdom; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Ziguo Zhang
- Division of Structural Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, United Kingdom; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Kiran Kulkarni
- Division of Structural Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, United Kingdom; Division of Biochemical Sciences, Council of Scientific and Industrial Research National Chemical Laboratory, Pune 411008, India
| | - Leifu Chang
- Division of Structural Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, United Kingdom; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Hiroyuki Yamano
- Cancer Institute, University College London, Paul O'Gorman Building, 72 Huntley Street, London WC1E 6BT, United Kingdom
| | - David Barford
- Division of Structural Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, United Kingdom; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom.
| |
Collapse
|
23
|
Survey of Natural Language Processing Techniques in Bioinformatics. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:674296. [PMID: 26525745 PMCID: PMC4615216 DOI: 10.1155/2015/674296] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2015] [Revised: 06/12/2015] [Accepted: 06/21/2015] [Indexed: 01/02/2023]
Abstract
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Collapse
|
24
|
DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields. Int J Mol Sci 2015; 16:17315-30. [PMID: 26230689 PMCID: PMC4581195 DOI: 10.3390/ijms160817315] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 07/15/2015] [Accepted: 07/16/2015] [Indexed: 12/14/2022] Open
Abstract
Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.
Collapse
|
25
|
Wang Z, Yang Q, Li T, Cong P. DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach. PLoS One 2015; 10:e0128334. [PMID: 26090958 PMCID: PMC4474717 DOI: 10.1371/journal.pone.0128334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2014] [Accepted: 04/26/2015] [Indexed: 11/21/2022] Open
Abstract
The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database.
Collapse
Affiliation(s)
- Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Qianqian Yang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (T-HL); (P-SC)
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (T-HL); (P-SC)
| |
Collapse
|
26
|
Zaytsev AV, Mick JE, Maslennikov E, Nikashin B, DeLuca JG, Grishchuk EL. Multisite phosphorylation of the NDC80 complex gradually tunes its microtubule-binding affinity. Mol Biol Cell 2015; 26:1829-44. [PMID: 25808492 PMCID: PMC4436829 DOI: 10.1091/mbc.e14-11-1539] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 03/17/2015] [Indexed: 12/12/2022] Open
Abstract
Microtubule (MT) attachment to kinetochores is vitally important for cell division, but how these interactions are controlled by phosphorylation is not well known. We used quantitative approaches in vitro combined with molecular dynamics simulations to examine phosphoregulation of the NDC80 complex, a core kinetochore component. We show that the outputs from multiple phosphorylation events on the unstructured tail of its Hec1 subunit are additively integrated to elicit gradual tuning of NDC80-MT binding both in vitro and in silico. Conformational plasticity of the Hec1 tail enables it to serve as a phosphorylation-controlled rheostat, providing a new paradigm for regulating the affinity of MT binders. We also show that cooperativity of NDC80 interactions is weak and is unaffected by NDC80 phosphorylation. This in vitro finding strongly supports our model that independent molecular binding events to MTs by individual NDC80 complexes, rather than their structured oligomers, regulate the dynamics and stability of kinetochore-MT attachments in dividing cells.
Collapse
Affiliation(s)
- Anatoly V Zaytsev
- Physiology Department, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Jeanne E Mick
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523
| | - Evgeny Maslennikov
- Center for Theoretical Problems of Physico-Chemical Pharmacology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Boris Nikashin
- Physiology Department, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Jennifer G DeLuca
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523
| | - Ekaterina L Grishchuk
- Physiology Department, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
27
|
Chang KY, Lin TP, Shih LY, Wang CK. Analysis and prediction of the critical regions of antimicrobial peptides based on conditional random fields. PLoS One 2015; 10:e0119490. [PMID: 25803302 PMCID: PMC4372350 DOI: 10.1371/journal.pone.0119490] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 01/14/2015] [Indexed: 11/27/2022] Open
Abstract
Antimicrobial peptides (AMPs) are potent drug candidates against microbes such as bacteria, fungi, parasites, and viruses. The size of AMPs ranges from less than ten to hundreds of amino acids. Often only a few amino acids or the critical regions of antimicrobial proteins matter the functionality. Accurately predicting the AMP critical regions could benefit the experimental designs. However, no extensive analyses have been done specifically on the AMP critical regions and computational modeling on them is either non-existent or settled to other problems. With a focus on the AMP critical regions, we thus develop a computational model AMPcore by introducing a state-of-the-art machine learning method, conditional random fields. We generate a comprehensive dataset of 798 AMPs cores and a low similarity dataset of 510 representative AMP cores. AMPcore could reach a maximal accuracy of 90% and 0.79 Matthew’s correlation coefficient on the comprehensive dataset and a maximal accuracy of 83% and 0.66 MCC on the low similarity dataset. Our analyses of AMP cores follow what we know about AMPs: High in glycine and lysine, but low in aspartic acid, glutamic acid, and methionine; the abundance of α-helical structures; the dominance of positive net charges; the peculiarity of amphipathicity. Two amphipathic sequence motifs within the AMP cores, an amphipathic α-helix and an amphipathic π-helix, are revealed. In addition, a short sequence motif at the N-terminal boundary of AMP cores is reported for the first time: arginine at the P(-1) coupling with glycine at the P1 of AMP cores occurs the most, which might link to microbial cell adhesion.
Collapse
Affiliation(s)
- Kuan Y. Chang
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan
- * E-mail:
| | - Tung-pei Lin
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan
| | - Ling-Yi Shih
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan
| | - Chien-Kuo Wang
- Department of Biotechnology, Asia University, Taichung, Taiwan
| |
Collapse
|
28
|
CRF-TM: A Conditional Random Field Method for Predicting Transmembrane Topology. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING. BIG DATA AND MACHINE LEARNING TECHNIQUES 2015. [DOI: 10.1007/978-3-319-23862-3_52] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
29
|
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) do not adopt a well-defined folded structure under physiological conditions. Instead, these proteins exist as heterogeneous and dynamical conformational ensembles. IDPs are widespread in eukaryotic proteomes and are involved in fundamental biological processes, mostly related to regulation and signaling. At the same time, disordered regions often pose significant challenges to the structure determination process, which generally requires highly homogeneous proteins samples. In this book chapter, we provide a brief overview of protein disorder, describe various bioinformatics resources that have been developed in recent years for their characterization, and give a general outline of their applications in various types of structural genomics projects. Traditionally, disordered segments were filtered out to optimize the yield of structure determination pipelines. However, it is becoming increasingly clear that the structural characterization of proteins cannot be complete without the incorporation of intrinsically disordered regions.
Collapse
Affiliation(s)
- Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | |
Collapse
|
30
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|
31
|
Ali H, Urolagin S, Gurarslan Ö, Vihinen M. Performance of Protein Disorder Prediction Programs on Amino Acid Substitutions. Hum Mutat 2014; 35:794-804. [DOI: 10.1002/humu.22564] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/04/2014] [Indexed: 01/04/2023]
Affiliation(s)
- Heidi Ali
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
| | - Siddhaling Urolagin
- Department of Experimental Medical Science; Lund University; SE-22184 Lund Sweden
| | - Ömer Gurarslan
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
| | - Mauno Vihinen
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
- Department of Experimental Medical Science; Lund University; SE-22184 Lund Sweden
- Tampere University Hospital; Tampere Finland
| |
Collapse
|
32
|
Becker J, Maes F, Wehenkel L. On the encoding of proteins for disordered regions prediction. PLoS One 2013; 8:e82252. [PMID: 24358161 PMCID: PMC3864923 DOI: 10.1371/journal.pone.0082252] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 10/21/2013] [Indexed: 12/02/2022] Open
Abstract
Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder.
Collapse
Affiliation(s)
- Julien Becker
- Bioinformatics and Modeling, GIGA-Research, University of Liege, Liege, Belgium
| | - Francis Maes
- Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium
- Declaratieve Talen en Artificiele Intelligentie, Departement Computerwetenschappen, University of Leuven, Leuven, Belgium
| | - Louis Wehenkel
- Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium
- * E-mail:
| |
Collapse
|
33
|
Phenotypic profiling of Mycobacterium tuberculosis EspA point mutants reveals that blockage of ESAT-6 and CFP-10 secretion in vitro does not always correlate with attenuation of virulence. J Bacteriol 2013; 195:5421-30. [PMID: 24078612 DOI: 10.1128/jb.00967-13] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The EspA protein of Mycobacterium tuberculosis is essential for the type VII ESX-1 protein secretion apparatus, which delivers the principal virulence factors ESAT-6 and CFP-10. In this study, site-directed mutagenesis of EspA was performed to elucidate its influence on the ESX-1 system. Replacing Trp(55) (W55) or Gly(57) (G57) residues in the putative W-X-G motif of EspA with arginines impaired ESAT-6 and CFP-10 secretion in vitro and attenuated M. tuberculosis. Replacing the Phe(50) (F50) and Lys(62) (K62) residues, which flank the W-X-G motif, with arginine and alanine, respectively, destabilized EspA, abolished ESAT-6 and CFP-10 secretion in vitro, and attenuated M. tuberculosis. Likewise, replacing the Phe(5) (F5) and Lys(41) (K41) residues with arginine and alanine, respectively, also destabilized EspA and blocked ESAT-6 and CFP-10 secretion in vitro. However, these two particular mutations did not attenuate M. tuberculosis in cellular models of infection or during acute infection in mice. We have thus identified amino acid residues in EspA that are important for facilitating ESAT-6 and CFP-10 secretion and virulence. However, our data also indicate for the first time that blockage of M. tuberculosis ESAT-6 and CFP-10 secretion in vitro and attenuation are mutually exclusive.
Collapse
|
34
|
Vergé V, Lozano JC, Schatt P, Peaucellier G. SGEBP, a giant protein from starfish oocytes able to interact with ERK. Mol Reprod Dev 2013; 80:816-25. [PMID: 23794267 DOI: 10.1002/mrd.22210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 06/14/2013] [Indexed: 11/12/2022]
Abstract
The mitogen-activated protein kinase (MAPK) pathway is a key regulator of animal meiotic divisions. It involves cascades of kinases whose specificity has been shown to depend on binding proteins acting as scaffolds. We searched for proteins interacting with starfish extracellular signal-regulated kinase (ERK) using the yeast two-hybrid system. An interacting clone was found to encode the 5' region of a giant 16.7-kb transcript encoded by an intronless gene. The corresponding 630-kDa protein could not be detected by Western blot, but the meiotic spindle was labelled by immunolocalization with an antibody against the ERK-binding domain. A related gene was found in the genome of another starfish species, and similarities were also found to a 42.9-kb open reading frame in the sea urchin genome. Yet, no conserved protein-binding domain was detected in the amino acid sequence(s) compared to all the known motifs. Structure prediction software indicated that the encoded proteins are probably disordered while a query of the disordered protein database indicated some similarity with vertebrates microtubule-associated protein 2 (MAP2). This predicts that SGEBP may function as a space-filling polymer, having a role in both cytoskeleton organization and ERK targeting.
Collapse
Affiliation(s)
- Valérie Vergé
- UPMC Univ Paris 06, Laboratoire Arago, Avenue Fontaulé, BP44F-66650, Banyuls/mer, France
| | | | | | | |
Collapse
|
35
|
A novel method of predicting protein disordered regions based on sequence features. BIOMED RESEARCH INTERNATIONAL 2013; 2013:414327. [PMID: 23710446 PMCID: PMC3654632 DOI: 10.1155/2013/414327] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2013] [Accepted: 03/26/2013] [Indexed: 01/27/2023]
Abstract
With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.
Collapse
|
36
|
|
37
|
Mizianty MJ, Peng Z, Kurgan L. MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. INTRINSICALLY DISORDERED PROTEINS 2013; 1:e24428. [PMID: 28516009 PMCID: PMC5424793 DOI: 10.4161/idp.24428] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 03/23/2013] [Indexed: 11/28/2022]
Abstract
Intrinsically disordered proteins (IDPs) are either entirely disordered or contain disordered regions in their native state. IDPs were found to be abundant in complex organisms and implicated in numerous cellular processes. Experimental annotation of disorder lags behind the rapidly growing sizes of the protein databases, and thus computational methods are used to close this gap and to investigate the disorder. MFDp2 is a novel content-rich and user-friendly web server for sequence-based prediction of protein disorder that builds upon our residue-level disorder predictor MFDp and chain-level disorder content predictor DisCon. It applies novel post-processing filters and uses sequence alignment to improve predictive quality. Using a new benchmark data set, which has reduced sequence identity to corresponding training data sets, MFDp2 is shown to provide competitive predictive quality when compared with MFDp and a comprehensive set of 13 other state-of-the-art predictors, including publicly available versions of the top predictors from CASP9. Our server obtains the highest Mathews Correlation Coefficient (MCC) and the second best Area Under the receiver operating characteristic Curve (AUC). In addition to the disorder predictions, our server also outputs well-described sequence-derived information that allows profiling the predicted disorder. We conveniently visualize sequence conservation, predicted secondary structure, relative solvent accessibility and alignments to chains with annotated disorder. We allow predictions for multiple proteins at the same time and each prediction can be downloaded as text-based (parsable) file. The web server, which includes help pages and tutorial, is freely available at biomine.ece.ualberta.ca/MFDp2/.
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton, AB Canada
| | - Zhenling Peng
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton, AB Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton, AB Canada
| |
Collapse
|
38
|
Fan X, Kurgan L. Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J Biomol Struct Dyn 2013; 32:448-64. [DOI: 10.1080/07391102.2013.775969] [Citation(s) in RCA: 113] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
39
|
Abstract
Kinesin molecular motors perform a myriad of intracellular transport functions. While their mechanochemical mechanisms are well understood and well-conserved throughout the superfamily, the cargo-binding and regulatory mechanisms governing the activity of kinesins are highly diverse and in general, are incompletely characterized. Here we present evidence from bioinformatic predictions indicating that most kinesin superfamily members contain significant regions of intrinsically disordered (ID) residues. ID regions can bind to multiple partners with high specificity, and are highly labile to post-translational modification and degradation signals. In kinesins, the predicted ID regions are primarily found in areas outside the motor domains, where primary sequences diverge by family, suggesting that ID may be a critical structural element for determining the functional specificity of individual kinesins. To support this idea, we present a systematic analysis of the kinesin superfamily, family by family, for predicted regions of ID. We combine this analysis with a comprehensive review of kinesin binding partners and post-translational modifications. We find two key trends across the entire kinesin superfamily. First, ID residues tend to be in the tail regions of kinesins, opposite the superfamily-conserved motor domains. Second, predicted ID regions correlate to regions that are known to bind to cargoes and/or undergo post-translational modifications. We therefore propose that ID is a structural element utilized by the kinesin superfamily in order to impart functional specificity to individual kinesins.
Collapse
|
40
|
Kallberg Y, Segerstolpe Å, Lackmann F, Persson B, Wieslander L. Evolutionary conservation of the ribosomal biogenesis factor Rbm19/Mrd1: implications for function. PLoS One 2012; 7:e43786. [PMID: 22984444 PMCID: PMC3440411 DOI: 10.1371/journal.pone.0043786] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 07/24/2012] [Indexed: 12/23/2022] Open
Abstract
Ribosome biogenesis in eukaryotes requires coordinated folding and assembly of a pre-rRNA into sequential pre-rRNA-protein complexes in which chemical modifications and RNA cleavages occur. These processes require many small nucleolar RNAs (snoRNAs) and proteins. Rbm19/Mrd1 is one such protein that is built from multiple RNA-binding domains (RBDs). We find that Rbm19/Mrd1 with five RBDs is present in all branches of the eukaryotic phylogenetic tree, except in animals and Choanoflagellates, that instead have a version with six RBDs and Microsporidia which have a minimal Rbm19/Mrd1 protein with four RBDs. Rbm19/Mrd1 therefore evolved as a multi-RBD protein very early in eukaryotes. The linkers between the RBDs have conserved properties; they are disordered, except for linker 3, and position the RBDs at conserved relative distances from each other. All but one of the RBDs have conserved properties for RNA-binding and each RBD has a specific consensus sequence and a conserved position in the protein, suggesting a functionally important modular design. The patterns of evolutionary conservation provide information for experimental analyses of the function of Rbm19/Mrd1. In vivo mutational analysis confirmed that a highly conserved loop 5-β4-strand in RBD6 is essential for function.
Collapse
Affiliation(s)
- Yvonne Kallberg
- Bioinformatics Infrastructure for Life Sciences, Science for Life Laboratory, Centre for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Åsa Segerstolpe
- Department of Molecular Biology and Functional Genomics, Stockholm University, Stockholm, Sweden
| | - Fredrik Lackmann
- Department of Molecular Biology and Functional Genomics, Stockholm University, Stockholm, Sweden
| | - Bengt Persson
- Bioinformatics Infrastructure for Life Sciences and Swedish eScience Research Centre, IFM Bioinformatics, Linköping University, Linköping, Sweden
- Science for Life Laboratory, Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Lars Wieslander
- Department of Molecular Biology and Functional Genomics, Stockholm University, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
41
|
Seeger MA, Zhang Y, Rice SE. Kinesin tail domains are intrinsically disordered. Proteins 2012; 80:2437-46. [PMID: 22674872 DOI: 10.1002/prot.24128] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Revised: 05/22/2012] [Accepted: 05/25/2012] [Indexed: 12/11/2022]
Abstract
Kinesin motor proteins transport a wide variety of molecular cargoes in a spatially and temporally regulated manner. Kinesin motor domains, which hydrolyze ATP to produce a directed mechanical force along a microtubule, are well conserved throughout the entire superfamily. Outside of the motor domains, kinesin sequences diverge along with their transport functions. The nonmotor regions, particularly the tails, respond to a wide variety of structural and molecular cues that enable kinesins to carry specific cargoes in response to particular cellular signals. Here, we demonstrate that intrinsic disorder is a common structural feature of kinesins. A bioinformatics survey of the full-length sequences of all 43 human kinesins predicts that significant regions of intrinsically disordered residues are present in all kinesins. These regions are concentrated in the nonmotor domains, particularly in the tails and near sites for ligand binding or post-translational modifications. In order to experimentally verify these predictions, we expressed and purified the tail domains of kinesins representing three different families (Kif5B, Kif10, and KifC3). Circular dichroism and NMR spectroscopy experiments demonstrate that the isolated tails are disordered in vitro, yet they retain their functional microtubule-binding activity. On the basis of these results, we propose that intrinsic disorder is a common structural feature that confers functional specificity to kinesins.
Collapse
Affiliation(s)
- Mark A Seeger
- Department of Cell and Molecular Biology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | | | | |
Collapse
|
42
|
Andresen C, Helander S, Lemak A, Farès C, Csizmok V, Carlsson J, Penn LZ, Forman-Kay JD, Arrowsmith CH, Lundström P, Sunnerhagen M. Transient structure and dynamics in the disordered c-Myc transactivation domain affect Bin1 binding. Nucleic Acids Res 2012; 40:6353-66. [PMID: 22457068 PMCID: PMC3401448 DOI: 10.1093/nar/gks263] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The crucial role of Myc as an oncoprotein and as a key regulator of cell growth makes it essential to understand the molecular basis of Myc function. The N-terminal region of c-Myc coordinates a wealth of protein interactions involved in transformation, differentiation and apoptosis. We have characterized in detail the intrinsically disordered properties of Myc-1–88, where hierarchical phosphorylation of S62 and T58 regulates activation and destruction of the Myc protein. By nuclear magnetic resonance (NMR) chemical shift analysis, relaxation measurements and NOE analysis, we show that although Myc occupies a very heterogeneous conformational space, we find transiently structured regions in residues 22–33 and in the Myc homology box I (MBI; residues 45–65); both these regions are conserved in other members of the Myc family. Binding of Bin1 to Myc-1–88 as assayed by NMR and surface plasmon resonance (SPR) revealed primary binding to the S62 region in a dynamically disordered and multivalent complex, accompanied by population shifts leading to altered intramolecular conformational dynamics. These findings expand the increasingly recognized concept of intrinsically disordered regions mediating transient interactions to Myc, a key transcriptional regulator of major medical importance, and have important implications for further understanding its multifaceted role in gene regulation.
Collapse
Affiliation(s)
- Cecilia Andresen
- Division of Molecular Biotechnology, Department of Physics, Chemistry and Biology, Linköping University, SE-58183 Linköping, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Ghalwash MF, Dunker AK, Obradović Z. Uncertainty analysis in protein disorder prediction. MOLECULAR BIOSYSTEMS 2011; 8:381-91. [PMID: 22101336 DOI: 10.1039/c1mb05373f] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
UNLABELLED A grand challenge in the proteomics and structural genomics era is the prediction of protein structure, including identification of those proteins that are partially or wholly unstructured. A number of predictors for identification of intrinsically disordered proteins (IDPs) have been developed over the last decade, but none can be taken as a fully reliable on its own. Using a single model for prediction is typically inadequate because prediction based on only the most accurate model ignores model uncertainty. In this paper, we present an empirical method to specify and measure uncertainty associated with disorder predictions. In particular, we analyze the uncertainty in the reference model itself and the uncertainty in data. This is achieved by training a set of models and developing several meta predictors on top of them. The best meta predictor achieved comparable or better results than any other single model, suggesting that incorporating different aspects of protein disorder prediction is important for the disorder prediction task. In addition, the best meta-predictor had more balanced sensitivity and specificity than any individual model. We also assessed the effects of changes in disorder prediction as a function of changes in the protein sequence. For collections of homologous sequences, we found that mutations caused many of the predicted disordered residues to be flipped to be predicted as ordered residues, while the reverse was observed much less frequently. These results suggest that disorder tendencies are more sensitive to allowed mutations than structure tendencies and the conservation of disorder is indeed less stable than conservation of structure. AVAILABILITY five meta-predictors and four single models developed for this study will be publicly freely accessible for non-commercial use.
Collapse
Affiliation(s)
- Mohamed F Ghalwash
- Center for Data Analytics and Biomedical Informatics, Computer and Information Sciences Department, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA.
| | | | | |
Collapse
|
44
|
Zhang P, Obradovic Z. Unsupervised Integration of Multiple Protein Disorder Predictors: The Method and Evaluation on CASP7, CASP8 and CASP9 Data. Proteome Sci 2011; 9 Suppl 1:S12. [PMID: 22166115 PMCID: PMC3289073 DOI: 10.1186/1477-5956-9-s1-s12] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Studies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder have been developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders. Results To more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at CASP7, CASP8, and CASP9 have shown the effectiveness of the proposed algorithm. Conclusions The proposed algorithm can potentially be used to predict protein disorder and provide helpful suggestions on choosing suitable disorder predictors for unknown protein sequences.
Collapse
Affiliation(s)
- Ping Zhang
- Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA 19122, USA.
| | | |
Collapse
|
45
|
Monastyrskyy B, Fidelis K, Moult J, Tramontano A, Kryshtafovych A. Evaluation of disorder predictions in CASP9. Proteins 2011; 79 Suppl 10:107-18. [PMID: 21928402 PMCID: PMC3212657 DOI: 10.1002/prot.23161] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2011] [Revised: 07/11/2011] [Accepted: 07/15/2011] [Indexed: 11/10/2022]
Abstract
Lack of stable three-dimensional structure, or intrinsic disorder, is a common phenomenon in proteins. Naturally, unstructured regions are proven to be essential for carrying function by many proteins, and therefore identification of such regions is an important issue. CASP has been assessing the state of the art in predicting disorder regions from amino acid sequence since 2002. Here, we present the results of the evaluation of the disorder predictions submitted to CASP9. The assessment is based on the evaluation measures and procedures used in previous CASPs. The balanced accuracy and the Matthews correlation coefficient were chosen as basic measures for evaluating the correctness of binary classifications. The area under the receiver operating characteristic curve was the measure of choice for evaluating probability-based predictions of disorder. The CASP9 methods are shown to perform slightly better than the CASP7 methods but not better than the methods in CASP8. It was also shown that capability of most CASP9 methods to predict disorder decreases with increasing minimum disorder segment length.
Collapse
Affiliation(s)
- Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis, CA 95616, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis, CA 95616, USA
| | - John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Anna Tramontano
- Department of Physics, Sapienza University of Rome, P.le Aldo Moro 5, 00185 Rome, Italy
| | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis, CA 95616, USA
| |
Collapse
|
46
|
Mizianty MJ, Zhang T, Xue B, Zhou Y, Dunker AK, Uversky VN, Kurgan L. In-silico prediction of disorder content using hybrid sequence representation. BMC Bioinformatics 2011; 12:245. [PMID: 21682902 PMCID: PMC3212983 DOI: 10.1186/1471-2105-12-245] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2010] [Accepted: 06/17/2011] [Indexed: 11/25/2022] Open
Abstract
Background Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content. Results We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content. Conclusions DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/.
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | | | | | | | | | | | | |
Collapse
|
47
|
Mizianty MJ, Stach W, Chen K, Kedarisetti KD, Disfani FM, Kurgan L. Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. ACTA ACUST UNITED AC 2010; 26:i489-96. [PMID: 20823312 PMCID: PMC2935446 DOI: 10.1093/bioinformatics/btq373] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results: We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability:http://biomine.ece.ualberta.ca/MFDp.html Supplementary information:Supplementary data are available at Bioinformatics online. Contact:lkurgan@ece.ualberta.ca
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | | | | | | | | | | |
Collapse
|
48
|
Uversky VN, Dunker AK. Understanding protein non-folding. BIOCHIMICA ET BIOPHYSICA ACTA 2010; 1804:1231-64. [PMID: 20117254 PMCID: PMC2882790 DOI: 10.1016/j.bbapap.2010.01.017] [Citation(s) in RCA: 925] [Impact Index Per Article: 61.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2009] [Revised: 01/09/2010] [Accepted: 01/21/2010] [Indexed: 02/07/2023]
Abstract
This review describes the family of intrinsically disordered proteins, members of which fail to form rigid 3-D structures under physiological conditions, either along their entire lengths or only in localized regions. Instead, these intriguing proteins/regions exist as dynamic ensembles within which atom positions and backbone Ramachandran angles exhibit extreme temporal fluctuations without specific equilibrium values. Many of these intrinsically disordered proteins are known to carry out important biological functions which, in fact, depend on the absence of a specific 3-D structure. The existence of such proteins does not fit the prevailing structure-function paradigm, which states that a unique 3-D structure is a prerequisite to function. Thus, the protein structure-function paradigm has to be expanded to include intrinsically disordered proteins and alternative relationships among protein sequence, structure, and function. This shift in the paradigm represents a major breakthrough for biochemistry, biophysics and molecular biology, as it opens new levels of understanding with regard to the complex life of proteins. This review will try to answer the following questions: how were intrinsically disordered proteins discovered? Why don't these proteins fold? What is so special about intrinsic disorder? What are the functional advantages of disordered proteins/regions? What is the functional repertoire of these proteins? What are the relationships between intrinsically disordered proteins and human diseases?
Collapse
Affiliation(s)
- Vladimir N Uversky
- Institute for Intrinsically Disordered Protein Research, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | | |
Collapse
|
49
|
Abstract
In recent years it was shown that a large number of proteins are either fully or partially disordered. Intrinsically disordered proteins are ubiquitary proteins that fulfill essential biological functions while lacking a stable 3D structure. Despite the large abundance of disorder, disordered regions are still poorly detected. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental in delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting disorder and identifying regions involved in induced folding.
Collapse
Affiliation(s)
- Sonia Longhi
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Marseille, France
| | | | | |
Collapse
|
50
|
Dosztanyi Z, Meszaros B, Simon I. Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins. Brief Bioinform 2009; 11:225-43. [DOI: 10.1093/bib/bbp061] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|