1
|
Xie J, Jin X, Wei H, Sun S, Liu Y. IDP-EDL: enhancing intrinsically disordered protein prediction by combining protein language model and ensemble deep learning. Brief Bioinform 2025; 26:bbaf182. [PMID: 40254833 PMCID: PMC12009716 DOI: 10.1093/bib/bbaf182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 02/26/2025] [Accepted: 03/30/2025] [Indexed: 04/22/2025] Open
Abstract
Identification of intrinsically disordered regions (IDRs) in proteins is essential for understanding fundamental cellular processes. The IDRs can be divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. In previous studies, most computational methods ignored the differences between LDRs and SDRs, and therefore failed to capture the different patterns of LDRs and SDRs. In this study, we propose IDP-EDL, an ensemble of three predictors. The component predictors were first built based on pretrained protein language model and applied task-specific fine-tuning for short, long, and generic disordered regions. A meta predictor was then trained to integrate three task-specific predictors into the final predictor. The results of experiments show that task-specific supervised fine-tuning can capture the different features of LDRs and SDRs and IDP-EDL can achieve stable performance on datasets with different ratios of LDRs and SDRs. More importantly, IDP-EDL can reach or even surpass state-of-the-art performance than other existing predictors on independent test sets. IDP-EDL is available at https://github.com/joestarXjx/IDP-EDL.
Collapse
Affiliation(s)
- Junxi Xie
- College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, South Campus: 266 Xinglong Section of Xifeng Road, Xi’an, Shaanxi 710126, North Campus: No. 2 South Taibai Road, Xi’an, Shaanxi 710071, China
| | - SaiSai Sun
- School of Computer Science and Technology, Xidian University, South Campus: 266 Xinglong Section of Xifeng Road, Xi’an, Shaanxi 710126, North Campus: No. 2 South Taibai Road, Xi’an, Shaanxi 710071, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China
| |
Collapse
|
2
|
Gonzalez LN, Cabeza MS, Robello C, Guerrero SA, Iglesias AA, Arias DG. Biochemical characterization of GAF domain of free-R-methionine sulfoxide reductase from Trypanosoma cruzi. Biochimie 2023; 213:190-204. [PMID: 37423556 DOI: 10.1016/j.biochi.2023.07.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 06/30/2023] [Accepted: 07/07/2023] [Indexed: 07/11/2023]
Abstract
Trypanosoma cruzi is the causal agent of Chagas Disease and is a unicellular parasite that infects a wide variety of mammalian hosts. The parasite exhibits auxotrophy by L-Met; consequently, it must be acquired from the extracellular environment of the host, either mammalian or invertebrate. Methionine (Met) oxidation produces a racemic mixture (R and S forms) of methionine sulfoxide (MetSO). Reduction of L-MetSO (free or protein-bound) to L-Met is catalyzed by methionine sulfoxide reductases (MSRs). Bioinformatics analyses identified the coding sequence for a free-R-MSR (fRMSR) enzyme in the genome of T. cruzi Dm28c. Structurally, this enzyme is a modular protein with a putative N-terminal GAF domain linked to a C-terminal TIP41 motif. We performed detailed biochemical and kinetic characterization of the GAF domain of fRMSR in combination with mutant versions of specific cysteine residues, namely, Cys12, Cys98, Cys108, and Cys132. The isolated recombinant GAF domain and full-length fRMSR exhibited specific catalytic activity for the reduction of free L-Met(R)SO (non-protein bound), using tryparedoxins as reducing partners. We demonstrated that this process involves two Cys residues, Cys98 and Cys132. Cys132 is the essential catalytic residue on which a sulfenic acid intermediate is formed. Cys98 is the resolutive Cys, which forms a disulfide bond with Cys132 as a catalytic step. Overall, our results provide new insights into redox metabolism in T. cruzi, contributing to previous knowledge of L-Met metabolism in this parasite.
Collapse
Affiliation(s)
- Lihue N Gonzalez
- Laboratorio de Enzimología Molecular - Instituto de Agrobiotecnología del Litoral (CONICET-UNL), Santa Fe, Argentina; Cátedra de Bioquímica Básica de Macromoléculas. Facultad de Bioquímica y Ciencias Biológicas - Universidad Nacional del Litoral, Santa Fe, Argentina
| | - Matías S Cabeza
- Laboratorio de Micología y Diagnóstico Molecular. Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral, Santa Fe, Argentina; Cátedra de Parasitología y Micología. Facultad de Bioquímica y Ciencias Biológicas - Universidad Nacional del Litoral, Santa Fe, Argentina
| | - Carlos Robello
- Laboratorio de Interacciones Hospedero Patógeno/UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay; Departamento de Bioquímica, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Sergio A Guerrero
- Laboratorio de Enzimología Molecular - Instituto de Agrobiotecnología del Litoral (CONICET-UNL), Santa Fe, Argentina; Cátedra de Parasitología y Micología. Facultad de Bioquímica y Ciencias Biológicas - Universidad Nacional del Litoral, Santa Fe, Argentina
| | - Alberto A Iglesias
- Laboratorio de Enzimología Molecular - Instituto de Agrobiotecnología del Litoral (CONICET-UNL), Santa Fe, Argentina; Cátedra de Bioquímica Básica de Macromoléculas. Facultad de Bioquímica y Ciencias Biológicas - Universidad Nacional del Litoral, Santa Fe, Argentina
| | - Diego G Arias
- Laboratorio de Enzimología Molecular - Instituto de Agrobiotecnología del Litoral (CONICET-UNL), Santa Fe, Argentina; Cátedra de Bioquímica Básica de Macromoléculas. Facultad de Bioquímica y Ciencias Biológicas - Universidad Nacional del Litoral, Santa Fe, Argentina.
| |
Collapse
|
3
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
4
|
Fukuchi S, Noguchi T, Anbo H, Homma K. Exon Elongation Added Intrinsically Disordered Regions to the Encoded Proteins and Facilitated the Emergence of the Last Eukaryotic Common Ancestor. Mol Biol Evol 2022; 40:6931801. [PMID: 36529689 PMCID: PMC9825244 DOI: 10.1093/molbev/msac272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/06/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Most prokaryotic proteins consist of a single structural domain (SD) with little intrinsically disordered regions (IDRs) that by themselves do not adopt stable structures, whereas the typical eukaryotic protein comprises multiple SDs and IDRs. How eukaryotic proteins evolved to differ from prokaryotic proteins has not been fully elucidated. Here, we found that the longer the internal exons are, the more frequently they encode IDRs in eight eukaryotes including vertebrates, invertebrates, a fungus, and plants. Based on this observation, we propose the "small bang" model from the proteomic viewpoint: the protoeukaryotic genes had no introns and mostly encoded one SD each, but a majority of them were subsequently divided into multiple exons (step 1). Many exons unconstrained by SDs elongated to encode IDRs (step 2). The elongated exons encoding IDRs frequently facilitated the acquisition of multiple SDs to make the last common ancestor of eukaryotes (step 3). One prediction of the model is that long internal exons are mostly unconstrained exons. Analytical results of the eight eukaryotes are consistent with this prediction. In support of the model, we identified cases of internal exons that elongated after the rat-mouse divergence and discovered that the expanded sections are mostly in unconstrained exons and preferentially encode IDRs. The model also predicts that SDs followed by long internal exons tend to have other SDs downstream. This prediction was also verified in all the eukaryotic species analyzed. Our model accounts for the dichotomy between prokaryotic and eukaryotic proteins and proposes a selective advantage conferred by IDRs.
Collapse
Affiliation(s)
- Satoshi Fukuchi
- Program for Information Systems, Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi-shi, Japan
| | - Tamotsu Noguchi
- Pharmaceutical Education Research Center, Meiji Pharmaceutical University, Kiyose, Tokyo, Japan
| | - Hiroto Anbo
- Program for Information Systems, Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi-shi, Japan
| | | |
Collapse
|
5
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
6
|
Tang YJ, Pang YH, Liu B. DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network. Bioinformatics 2022; 38:1252-1260. [PMID: 34864847 DOI: 10.1093/bioinformatics/btab810] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 11/02/2021] [Accepted: 11/26/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. The IDRs are divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. Previous studies have shown that LDRs and SDRs have different proprieties. However, the existing computational methods fail to extract different features for LDRs and SDRs separately. As a result, they achieve unstable performance on datasets with different ratios of LDRs and SDRs. RESULTS In this study, a two-layer predictor was proposed called DeepIDP-2L. In the first layer, two kinds of attention-based models are used to extract different features for LDRs and SDRs, respectively. The hierarchical attention network is used to capture the distribution pattern features of LDRs, and convolutional attention network is used to capture the local correlation features of SDRs. The second layer of DeepIDP-2L maps the feature extracted in the first layer into a new feature space. Convolutional network and bidirectional long short term memory are used to capture the local and long-range information for predicting both SDRs and LDRs. Experimental results show that DeepIDP-2L can achieve more stable performance than other exiting predictors on independent test sets with different ratios of SDRs and LDRs. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the new predictor has been established at http://bliulab.net/DeepIDP-2L/. It is anticipated that DeepIDP-2L will become a very useful tool for identification of intrinsically disordered regions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
7
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
8
|
Del Amo-Maestro L, Sagar A, Pompach P, Goulas T, Scavenius C, Ferrero DS, Castrillo-Briceño M, Taulés M, Enghild JJ, Bernadó P, Gomis-Rüth FX. An Integrative Structural Biology Analysis of Von Willebrand Factor Binding and Processing by ADAMTS-13 in Solution. J Mol Biol 2021; 433:166954. [PMID: 33771572 DOI: 10.1016/j.jmb.2021.166954] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 03/16/2021] [Accepted: 03/16/2021] [Indexed: 10/21/2022]
Abstract
Von Willebrand Factor (vWF), a 300-kDa plasma protein key to homeostasis, is cleaved at a single site by multi-domain metallopeptidase ADAMTS-13. vWF is the only known substrate of this peptidase, which circulates in a latent form and becomes allosterically activated by substrate binding. Herein, we characterised the complex formed by a competent peptidase construct (AD13-MDTCS) comprising metallopeptidase (M), disintegrin-like (D), thrombospondin (T), cysteine-rich (C), and spacer (S) domains, with a 73-residue functionally relevant vWF-peptide, using nine complementary techniques. Pull-down assays, gel electrophoresis, and surface plasmon resonance revealed tight binding with sub-micromolar affinity. Cross-linking mass spectrometry with four reagents showed that, within the peptidase, domain D approaches M, C, and S. S is positioned close to M and C, and the peptide contacts all domains. Hydrogen/deuterium exchange mass spectrometry revealed strong and weak protection for C/D and M/S, respectively. Structural analysis by multi-angle laser light scattering and small-angle X-ray scattering in solution revealed that the enzyme adopted highly flexible unbound, latent structures and peptide-bound, active structures that differed from the AD13-MDTCS crystal structure. Moreover, the peptide behaved like a self-avoiding random chain. We integrated the results with computational approaches, derived an ensemble of structures that collectively satisfied all experimental restraints, and discussed the functional implications. The interaction conforms to a 'fuzzy complex' that follows a 'dynamic zipper' mechanism involving numerous reversible, weak but additive interactions that result in strong binding and cleavage. Our findings contribute to illuminating the biochemistry of the vWF:ADAMTS-13 axis.
Collapse
Affiliation(s)
- Laura Del Amo-Maestro
- Proteolysis Laboratory, Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona Science Park, c/Baldiri Reixac, 15-21, 08028 Barcelona, Catalonia, Spain
| | - Amin Sagar
- Centre de Biochimie Structurale, INSERM, CNRS and Université de Montpellier, 34090 Montpellier, France
| | - Petr Pompach
- Institute of Microbiology of the Czech Academy of Sciences, BIOCEV, Prumyslova 595, 252 50 Vestec, Czechia; Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, Prumyslova 595, 252 50 Vestec, Czechia
| | - Theodoros Goulas
- Proteolysis Laboratory, Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona Science Park, c/Baldiri Reixac, 15-21, 08028 Barcelona, Catalonia, Spain
| | - Carsten Scavenius
- Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds Vej 10, 8000 Aarhus C, Denmark
| | - Diego S Ferrero
- Laboratory for Viruses and Large Biological Complexes, Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona Science Park, c/Baldiri Reixac, 15-21, 08028 Barcelona, Catalonia, Spain
| | - Mariana Castrillo-Briceño
- Proteolysis Laboratory, Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona Science Park, c/Baldiri Reixac, 15-21, 08028 Barcelona, Catalonia, Spain
| | - Marta Taulés
- Scientific and Technological Centers (CCiTUB), University of Barcelona, Lluís Solé i Sabaris, 1-3, 08028 Barcelona, Catalonia, Spain
| | - Jan J Enghild
- Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds Vej 10, 8000 Aarhus C, Denmark
| | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS and Université de Montpellier, 34090 Montpellier, France.
| | - F Xavier Gomis-Rüth
- Proteolysis Laboratory, Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona Science Park, c/Baldiri Reixac, 15-21, 08028 Barcelona, Catalonia, Spain.
| |
Collapse
|
9
|
Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 2021; 36:5177-5186. [PMID: 32702119 DOI: 10.1093/bioinformatics/btaa667] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/21/2020] [Accepted: 07/17/2020] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization. RESULTS In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
10
|
Jain M, Muthukumaran J, Singh AK. Comparative structural and functional analysis of STL and SLL, chitin-binding lectins from Solanum spp. J Biomol Struct Dyn 2020; 39:4907-4922. [DOI: 10.1080/07391102.2020.1781693] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Monika Jain
- Department of Biotechnology, School of Engineering and Technology, Sharda University, Greater Noida, India
| | - Jayaraman Muthukumaran
- Department of Biotechnology, School of Engineering and Technology, Sharda University, Greater Noida, India
| | - Amit Kumar Singh
- Department of Biotechnology, School of Engineering and Technology, Sharda University, Greater Noida, India
| |
Collapse
|
11
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
12
|
Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019; 20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open
Abstract
Intrinsically disordered proteins and regions are widely distributed in proteins, which are associated with many biological processes and diseases. Accurate prediction of intrinsically disordered proteins and regions is critical for both basic research (such as protein structure and function prediction) and practical applications (such as drug development). During the past decades, many computational approaches have been proposed, which have greatly facilitated the development of this important field. Therefore, a comprehensive and updated review is highly required. In this regard, we give a review on the computational methods for intrinsically disordered protein and region prediction, especially focusing on the recent development in this field. These computational approaches are divided into four categories based on their methodologies, including physicochemical-based method, machine-learning-based method, template-based method and meta method. Furthermore, their advantages and disadvantages are also discussed. The performance of 40 state-of-the-art predictors is directly compared on the target proteins in the task of disordered region prediction in the 10th Critical Assessment of protein Structure Prediction. A more comprehensive performance comparison of 45 different predictors is conducted based on seven widely used benchmark data sets. Finally, some open problems and perspectives are discussed.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| |
Collapse
|
13
|
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]
Abstract
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
Collapse
|
14
|
Nielsen JT, Mulder FAA. Quality and bias of protein disorder predictors. Sci Rep 2019; 9:5137. [PMID: 30914747 PMCID: PMC6435736 DOI: 10.1038/s41598-019-41644-w] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/13/2019] [Indexed: 02/03/2023] Open
Abstract
Disorder in proteins is vital for biological function, yet it is challenging to characterize. Therefore, methods for predicting protein disorder from sequence are fundamental. Currently, predictors are trained and evaluated using data from X-ray structures or from various biochemical or spectroscopic data. However, the prediction accuracy of disordered predictors is not calibrated, nor is it established whether predictors are intrinsically biased towards one of the extremes of the order-disorder axis. We therefore generated and validated a comprehensive experimental benchmarking set of site-specific and continuous disorder, using deposited NMR chemical shift data. This novel experimental data collection is fully appropriate and represents the full spectrum of disorder. We subsequently analyzed the performance of 26 widely-used disorder prediction methods and found that these vary noticeably. At the same time, a distinct bias for over-predicting order was identified for some algorithms. Our analysis has important implications for the validity and the interpretation of protein disorder, as utilized, for example, in assessing the content of disorder in proteomes.
Collapse
Affiliation(s)
- Jakob T Nielsen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| | - Frans A A Mulder
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| |
Collapse
|
15
|
Homma K, Anbo H, Noguchi T, Fukuchi S. Both Intrinsically Disordered Regions and Structural Domains Evolve Rapidly in Immune-Related Mammalian Proteins. Int J Mol Sci 2018; 19:ijms19123860. [PMID: 30518031 PMCID: PMC6321239 DOI: 10.3390/ijms19123860] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 12/01/2018] [Accepted: 12/02/2018] [Indexed: 01/07/2023] Open
Abstract
Eukaryotic proteins consist of structural domains (SDs) and intrinsically disordered regions (IDRs), i.e., regions that by themselves do not assume unique three-dimensional structures. IDRs are generally subject to less constraint and evolve more rapidly than SDs. Proteins with a lower number of protein-to-protein interactions (PPIs) are also less constrained and tend to evolve fast. Extracellular proteins of mammals, especially immune-related extracellular proteins, on average have relatively high evolution rates. This article aims to examine if a high evolution rate in IDRs or that in SDs accounts for the rapid evolution of extracellular proteins. To this end, we classified eukaryotic proteins based on their cellular localizations and analyzed them. Moreover, we divided proteins into SDs and IDRs and calculated the respective evolution rate. Fractional IDR content is positively correlated with evolution rate. For their fractional IDR content, immune-related extracellular proteins show an aberrantly high evolution rate. IDRs evolve more rapidly than SDs in most subcellular localizations. In extracellular proteins, however, the difference is diminished. For immune-related proteins in mammals in particular, the evolution rates in SDs come close to those in IDRs. Thus high evolution rates in both IDRs and SDs account for the rapid evolution of immune-related proteins.
Collapse
Affiliation(s)
- Keiichi Homma
- Department of Life Science and Informatics, Maebashi Institute of Technology, 460-1 Kamisadori-machi, Maebashi-shi 371-0816, Japan.
| | - Hiroto Anbo
- Department of Life Science and Informatics, Maebashi Institute of Technology, 460-1 Kamisadori-machi, Maebashi-shi 371-0816, Japan.
| | - Tamotsu Noguchi
- Pharmaceutical Education Research Center, Meiji Pharmaceutical University, 2-522-1 Noshio, Kiyose-shi, Tokyo 204-8588, Japan.
| | - Satoshi Fukuchi
- Department of Life Science and Informatics, Maebashi Institute of Technology, 460-1 Kamisadori-machi, Maebashi-shi 371-0816, Japan.
| |
Collapse
|
16
|
Johnson NT, Dhroso A, Hughes KJ, Korkin D. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA (NEW YORK, N.Y.) 2018; 24:1119-1132. [PMID: 29941426 PMCID: PMC6097660 DOI: 10.1261/rna.062802.117] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 06/03/2018] [Indexed: 05/09/2023]
Abstract
RNA sequencing (RNA-seq) is becoming a prevalent approach to quantify gene expression and is expected to gain better insights into a number of biological and biomedical questions compared to DNA microarrays. Most importantly, RNA-seq allows us to quantify expression at the gene or transcript levels. However, leveraging the RNA-seq data requires development of new data mining and analytics methods. Supervised learning methods are commonly used approaches for biological data analysis that have recently gained attention for their applications to RNA-seq data. Here, we assess the utility of supervised learning methods trained on RNA-seq data for a diverse range of biological classification tasks. We hypothesize that the transcript-level expression data are more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment utilizes multiple data sets, organisms, lab groups, and RNA-seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-seq data sets and include over 2000 samples that come from multiple organisms, lab groups, and RNA-seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes, and pathological tumor stages for the samples from the cancerous tissue. For each problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the transcript-based classifiers outperform or are comparable with gene expression-based methods. The top-performing techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-seq based data analysis.
Collapse
Affiliation(s)
- Nathan T Johnson
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Andi Dhroso
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Katelyn J Hughes
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Dmitry Korkin
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
- Worcester Polytechnic Institute, Department of Computer Science, Worcester, Massachusetts 01609, USA
| |
Collapse
|
17
|
Kim SS, Seffernick JT, Lindert S. Accurately Predicting Disordered Regions of Proteins Using Rosetta ResidueDisorder Application. J Phys Chem B 2018; 122:3920-3930. [PMID: 29595057 DOI: 10.1021/acs.jpcb.8b01763] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Although many proteins necessitate well-folded structures to properly instigate their biological functions, a large fraction of functioning proteins contain regions-known as intrinsically disordered protein regions-where stable structures are not likely to form. Notable functional roles of intrinsically disordered proteins are in transcriptional regulation, translation, and cellular signal transduction. Moreover, intrinsically disordered protein regions are highly abundant in many proteins associated with various human diseases, therefore these segments have become attractive drug targets for potential therapeutics. Over the past decades, numerous computational methods have been developed to accurately predict disordered regions of proteins. Here we introduce a user-friendly and reliable approach for the prediction of disordered protein regions using the structure prediction software Rosetta. Using 245 proteins from a benchmark data set (16 DisProt database proteins) and a test data set (229 proteins with NMR data), we use Rosetta to predict the global protein structures and then show that there is a statistically significant difference between Rosetta scores in disordered and ordered regions, with scores being less favorable in disordered regions. Furthermore, the difference in scores between ordered and disordered protein regions is sufficient to accurately identify disordered protein regions. As a result, our Rosetta ResidueDisorder method (benchmark data set prediction accuracy of 71.77% and independent test data set prediction accuracy of 65.37%) outperformed other established disorder prediction tools and did not exhibit a biased prediction toward either ordered or disordered regions. To facilitate usage, a Rosetta application has been developed for the Rosetta ResidueDisorder method.
Collapse
Affiliation(s)
- Stephanie S Kim
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| | - Steffen Lindert
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| |
Collapse
|
18
|
Matsuo N, Goda N, Shimizu K, Fukuchi S, Ota M, Hiroaki H. Discovery of Cryoprotective Activity in Human Genome-Derived Intrinsically Disordered Proteins. Int J Mol Sci 2018; 19:ijms19020401. [PMID: 29385704 PMCID: PMC5855623 DOI: 10.3390/ijms19020401] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 01/17/2018] [Accepted: 01/22/2018] [Indexed: 12/13/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) are an emerging phenomenon. They may have a high degree of flexibility in their polypeptide chains, which lack a stable 3D structure. Although several biological functions of IDPs have been proposed, their general function is not known. The only finding related to their function is the genetically conserved YSK2 motif present in plant dehydrins. These proteins were shown to be IDPs with the YSK2 motif serving as a core region for the dehydrins’ cryoprotective activity. Here we examined the cryoprotective activity of randomly selected IDPs toward the model enzyme lactate dehydrogenase (LDH). All five IDPs that were examined were in the range of 35–45 amino acid residues in length and were equally potent at a concentration of 50 μg/mL, whereas folded proteins, the PSD-95/Dlg/ZO-1 (PDZ) domain, and lysozymes had no potency. We further examined their cryoprotective activity toward glutathione S-transferase as an example of the other enzyme, and toward enhanced green fluorescent protein as a non-enzyme protein example. We further examined the lyophilization protective activity of the peptides toward LDH, which revealed that some IDPs showed a higher activity than that of bovine serum albumin (BSA). Based on these observations, we propose that cryoprotection is a general feature of IDPs. Our findings may become a clue to various industrial applications of IDPs in the future.
Collapse
Affiliation(s)
- Naoki Matsuo
- Laboratory of Structural Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| | - Natsuko Goda
- Laboratory of Structural Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| | - Kana Shimizu
- Department of Computer Science and Communications Engineering, Waseda University, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.
| | - Satoshi Fukuchi
- Faculty of Engineering, Maebashi Institute of Technology, Maebashi 371-0816, Japan.
| | - Motonori Ota
- Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| | - Hidekazu Hiroaki
- Laboratory of Structural Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
- The Structural Biology Research Center and Division of Biological Science, Graduate School of Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| |
Collapse
|
19
|
Cieplak-Rotowska MK, Tarnowski K, Rubin M, Fabian MR, Sonenberg N, Dadlez M, Niedzwiecka A. Structural Dynamics of the GW182 Silencing Domain Including its RNA Recognition motif (RRM) Revealed by Hydrogen-Deuterium Exchange Mass Spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:158-173. [PMID: 29080206 PMCID: PMC5785596 DOI: 10.1007/s13361-017-1830-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 09/08/2017] [Accepted: 10/01/2017] [Indexed: 06/07/2023]
Abstract
The human GW182 protein plays an essential role in micro(mi)RNA-dependent gene silencing. miRNA silencing is mediated, in part, by a GW182 C-terminal region called the silencing domain, which interacts with the poly(A) binding protein and the CCR4-NOT deadenylase complex to repress protein synthesis. Structural studies of this GW182 fragment are challenging due to its predicted intrinsically disordered character, except for its RRM domain. However, detailed insights into the properties of proteins containing disordered regions can be provided by hydrogen-deuterium exchange mass spectrometry (HDX/MS). In this work, we applied HDX/MS to define the structural state of the GW182 silencing domain. HDX/MS analysis revealed that this domain is clearly divided into a natively unstructured part, including the CCR4-NOT interacting motif 1, and a distinct RRM domain. The GW182 RRM has a very dynamic structure, since water molecules can penetrate the whole domain in 2 h. The finding of this high structural dynamics sheds new light on the RRM structure. Though this domain is one of the most frequently occurring canonical protein domains in eukaryotes, these results are - to our knowledge - the first HDX/MS characteristics of an RRM. The HDX/MS studies show also that the α2 helix of the RRM can display EX1 behavior after a freezing-thawing cycle. This means that the RRM structure is sensitive to environmental conditions and can change its conformation, which suggests that the state of the RRM containing proteins should be checked by HDX/MS in regard of the conformational uniformity. Graphical Abstract.
Collapse
Affiliation(s)
- Maja K Cieplak-Rotowska
- Division of Biophysics, Institute of Experimental Physics, Faculty of Physics, University of Warsaw, 02-089, Warsaw, Poland
| | - Krzysztof Tarnowski
- Laboratory of Mass Spectrometry, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, PL-02106, Warsaw, Poland
| | - Marcin Rubin
- Division of Biophysics, Institute of Experimental Physics, Faculty of Physics, University of Warsaw, 02-089, Warsaw, Poland
| | - Marc R Fabian
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec, Canada
- Department of Oncology, McGill University, Montréal, Québec, Canada
| | - Nahum Sonenberg
- Department of Biochemistry, McGill University, Montréal, Québec, Canada
- Goodman Cancer Center, McGill University, Montréal, Québec, Canada
| | - Michal Dadlez
- Laboratory of Mass Spectrometry, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, PL-02106, Warsaw, Poland
| | - Anna Niedzwiecka
- Laboratory of Biological Physics, Institute of Physics, Polish Academy of Sciences, Aleja Lotnikow 32/46, PL-02668, Warsaw, Poland.
| |
Collapse
|
20
|
Yamamoto H, Kondo A, Itoh T. A curvature-dependent membrane binding by tyrosine kinase Fer involves an intrinsically disordered region. Biochem Biophys Res Commun 2017; 495:1522-1527. [PMID: 29208465 DOI: 10.1016/j.bbrc.2017.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 12/01/2017] [Indexed: 11/18/2022]
Abstract
Tyrosine kinases are important enzymes that mediate signal transduction at the plasma membrane. While the significance of membrane localization of tyrosine kinases has been well evaluated, the role of membrane curvature in their regulation is unknown. Here, we demonstrate that an intrinsically disordered region in the tyrosine kinase Fer acts as a membrane curvature sensor that preferentially binds to highly curved membranes in vitro. This region forms an amphipathic α-helix upon interaction with curved membranes, aligning hydrophobic residues on one side of the helical structure. Further, the tyrosine kinase activity of Fer is significantly enhanced by the membrane in a manner dependent on curvature. We propose a model for the regulation of Fer based on an intramolecular interaction and the curvature-dependent membrane binding mediated by its intrinsically disordered region.
Collapse
Affiliation(s)
- Hikaru Yamamoto
- Division of Membrane Biology, Biosignal Research Center, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo, 657-8501, Japan
| | - Akihiro Kondo
- Department of Biochemistry and Molecular Biology, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe, Hyogo 650-0017, Japan
| | - Toshiki Itoh
- Division of Membrane Biology, Biosignal Research Center, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo, 657-8501, Japan; Department of Biochemistry and Molecular Biology, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe, Hyogo 650-0017, Japan.
| |
Collapse
|
21
|
Wang S, Ma J, Xu J. AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2017; 32:i672-i679. [PMID: 27587688 DOI: 10.1093/bioinformatics/btw446] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. METHOD This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data. RESULTS Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others. AVAILABILITY AND IMPLEMENTATION http://raptorx2.uchicago.edu/StructurePropertyPred/predict/ CONTACT wangsheng@uchicago.edu, jinboxu@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
22
|
Hanson J, Yang Y, Paliwal K, Zhou Y. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 2017; 33:685-692. [PMID: 28011771 DOI: 10.1093/bioinformatics/btw678] [Citation(s) in RCA: 109] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 10/26/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. Results The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. Availability and Implementation SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . Contact j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary information Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane 4122, Australia
| | - Yuedong Yang
- Institute for Glycomics, Griffith University, Gold Coast 4215, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane 4122, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast 4215, Australia
| |
Collapse
|
23
|
Abstract
Over the past decade, it has become evident that a large proportion of proteins contain intrinsically disordered regions, which play important roles in pivotal cellular functions. Many computational tools have been developed with the aim of identifying the level and location of disorder within a protein. In this chapter, we describe a neural network based technique called SPINE-D that employs a unique three-state design and can accurately capture disordered residues in both short and long disordered regions. SPINE-D was trained on a large database of 4229 non-redundant proteins, and yielded an AUC of 0.86 on a cross-validation test and 0.89 on an independent test. SPINE-D can also detect a semi-disordered state that is associated with induced folders and aggregation-prone regions in disordered proteins and weakly stable or locally unfolded regions in structured proteins. We implement an online web service and an offline stand-alone program for SPINE-D, they are freely available at http://sparks-lab.org/SPINE-D/ . We then walk you through how to use the online and offline SPINE-D in making disorder predictions, and examine the disorder and semi-disorder prediction in a case study on the p53 protein.
Collapse
Affiliation(s)
- Tuo Zhang
- Department of Microbiology and Immunology, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46032, USA
- Research and Information Systems, LLC, Indianapolis, IN, USA
| | - Zhixiu Li
- Translational Genomics Group, Institute of Health and Biomedical Innovation, Queensland University of Technology at Translational Research Institute, 37 Kent Street, Woolloongabba, QLD, 4102, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Science 1 (G24) 2.10, Parklands Drive, Southport, QLD, 4222, Australia.
| |
Collapse
|
24
|
Richa T, Ide S, Suzuki R, Ebina T, Kuroda Y. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers. J Comput Aided Mol Des 2016; 31:237-244. [PMID: 28028736 DOI: 10.1007/s10822-016-9999-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2016] [Accepted: 12/10/2016] [Indexed: 10/20/2022]
Abstract
Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely ~2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/ .
Collapse
Affiliation(s)
- Tambi Richa
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
| | - Soichiro Ide
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
| | - Ryosuke Suzuki
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
| | - Teppei Ebina
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan.,Department of Physiology, Graduate school of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yutaka Kuroda
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan.
| |
Collapse
|
25
|
Homma K, Noguchi T, Fukuchi S. Codon usage is less optimized in eukaryotic gene segments encoding intrinsically disordered regions than in those encoding structural domains. Nucleic Acids Res 2016; 44:10051-10061. [PMID: 27915289 PMCID: PMC5137448 DOI: 10.1093/nar/gkw899] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 09/15/2016] [Accepted: 09/29/2016] [Indexed: 12/14/2022] Open
Abstract
Codon usage tends to be optimized in highly expressed genes. A plausible explanation for this phenomenon is that translational accuracy is increased in highly expressed genes with infrequent use of rare codons. Besides structural domains (SDs), eukaryotic proteins generally have intrinsically disordered regions (IDRs) that by themselves do not assume unique three-dimensional structures. As IDRs are free from structural constraint, they can probably accommodate more translational errors than SDs can. Thus, codon usage in IDRs is likely to be less optimized than that in SDs. Codon usage in all the genes of seven eukaryotes was examined in terms of both tRNA adaptation index and codon adaptation index. Different amino acid compositions in different protein regions were taken into account in calculating expected adaptation indices, to which observed indices were compared. Codon usage is less optimized in gene regions encoding IDRs than in those corresponding to SDs. The finding does not depend on whether IDRs are located at the N-terminus, in the middle, or at the C-terminus of proteins. Furthermore, the observation remains unchanged in two different algorithms used to predict IDRs in proteins. The result is consistent with the idea that IDRs tolerate more translational errors than SDs.
Collapse
Affiliation(s)
- Keiichi Homma
- Department of Life Science and Informatics, Maebashi Institute of Technology, 460-1 Kamisadori-machi, Maebashi-shi 371-0816, Japan
| | - Tamotsu Noguchi
- Pharmaceutical Education Research Center, Meiji Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-8588, Japan
| | - Satoshi Fukuchi
- Department of Life Science and Informatics, Maebashi Institute of Technology, 460-1 Kamisadori-machi, Maebashi-shi 371-0816, Japan
| |
Collapse
|
26
|
Wu T, Wang X, Zhang Z, Gong F, Song T, Chen Z, Zhang P, Zhao Y. NES-REBS: A novel nuclear export signal prediction method using regular expressions and biochemical properties. J Bioinform Comput Biol 2016; 14:1650013. [PMID: 27225342 DOI: 10.1142/s021972001650013x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A nuclear export signal (NES) is a protein localization signal, which is involved in binding of cargo proteins to nuclear export receptor, thus contributes to regulate localization of cellular proteins. Consensus sequences of NES have been used to detect NES from protein sequences, but suffer from poor predictive power. Some recent peering works were proposed to use biochemical properties of experimental verified NES to refine NES candidates. Those methods can achieve high prediction rates, but their execution time will become unacceptable for large-scale NES searching if too much properties are involved. In this work, we developed a novel computational approach, named NES-REBS, to search NES from protein sequences, where biochemical properties of experimental verified NES, including secondary structure and surface accessibility, are utilized to refine NES candidates obtained by matching popular consensus sequences. We test our method by searching 262 experimental verified NES from 221 NES-containing protein sequences. It is obtained that NES-REBS runs in 2-3[Formula: see text]mins and performs well by achieving precision rate 47.2% and sensitivity 54.6%.
Collapse
Affiliation(s)
- Tingfang Wu
- * School of Automation, Huazhong University of Science and Technology, Wuhan 430074, Hubei, P. R. China
| | - Xun Wang
- † College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, P. R. China
| | - Zheng Zhang
- * School of Automation, Huazhong University of Science and Technology, Wuhan 430074, Hubei, P. R. China
| | - Faming Gong
- † College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, P. R. China
| | - Tao Song
- † College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, P. R. China.,‡ Faculty of Engineering, Computing and Science Swinburne University of Technology, Sarawak Campus Kuching 93350, Malaysia
| | - Zhihua Chen
- * School of Automation, Huazhong University of Science and Technology, Wuhan 430074, Hubei, P. R. China
| | - Pan Zhang
- * School of Automation, Huazhong University of Science and Technology, Wuhan 430074, Hubei, P. R. China
| | - Yang Zhao
- * School of Automation, Huazhong University of Science and Technology, Wuhan 430074, Hubei, P. R. China
| |
Collapse
|
27
|
Velez G, Lin M, Christensen T, Faubion WA, Lomberk G, Urrutia R. Evidence supporting a critical contribution of intrinsically disordered regions to the biochemical behavior of full-length human HP1γ. J Mol Model 2015; 22:12. [PMID: 26680990 PMCID: PMC4683166 DOI: 10.1007/s00894-015-2874-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2015] [Accepted: 11/22/2015] [Indexed: 12/16/2022]
Abstract
HP1γ, a non-histone chromatin protein, has elicited significant attention because of its role in gene silencing, elongation, splicing, DNA repair, cell growth, differentiation, and many other cancer-associated processes, including therapy resistance. These characteristics make it an ideal target for developing small drugs for both mechanistic experimentation and potential therapies. While high-resolution structures of the two globular regions of HP1γ, the chromo- and chromoshadow domains, have been solved, little is currently known about the conformational behavior of the full-length protein. Consequently, in the current study, we use threading, homology-based molecular modeling, molecular mechanics calculations, and molecular dynamics simulations to develop models that allow us to infer properties of full-length HP1γ at an atomic resolution level. HP1γ appears as an elongated molecule in which three Intrinsically Disordered Regions (IDRs, 1, 2, and 3) endow this protein with dynamic flexibility, intermolecular recognition properties, and the ability to integrate signals from various intracellular pathways. Our modeling also suggests that the dynamic flexibility imparted to HP1γ by the three IDRs is important for linking nucleosomes with PXVXL motif-containing proteins, in a chromatin environment. The importance of the IDRs in intermolecular recognition is illustrated by the building and study of both IDR2 HP1γ−importin-α and IDR1 and IDR2 HP1γ−DNA complexes. The ability of the three IDRs for integrating cell signals is demonstrated by combined linear motif analyses and molecular dynamics simulations showing that posttranslational modifications can generate a histone mimetic sequence within the IDR2 of HP1γ, which when bound by the chromodomain can lead to an autoinhibited state. Combined, these data underscore the importance of IDRs 1, 2, and 3 in defining the structural and dynamic properties of HP1γ, discoveries that have both mechanistic and potentially biomedical relevance.
Collapse
Affiliation(s)
- Gabriel Velez
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA.,Medical Scientist Training Program, University of Iowa, Iowa City, IA, USA
| | - Marisa Lin
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Trace Christensen
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - William A Faubion
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Gwen Lomberk
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA.
| | - Raul Urrutia
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biochemistry and Molecular Biology, Mayo Clinic, 200 First Street SW, Guggenheim 10, Rochester, MN, 55905, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Biophysics, Mayo Clinic, Rochester, MN, USA. .,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
28
|
Habibi N, Norouzi A, Mohd Hashim SZ, Shamsir MS, Samian R. Prediction of recombinant protein overexpression in Escherichia coli using a machine learning based model (RPOLP). Comput Biol Med 2015; 66:330-6. [DOI: 10.1016/j.compbiomed.2015.09.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 09/18/2015] [Accepted: 09/19/2015] [Indexed: 01/28/2023]
|
29
|
DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel. PLoS One 2015; 10:e0141551. [PMID: 26517719 PMCID: PMC4627842 DOI: 10.1371/journal.pone.0141551] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Accepted: 10/09/2015] [Indexed: 12/02/2022] Open
Abstract
Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0.
Collapse
|
30
|
Survey of Natural Language Processing Techniques in Bioinformatics. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:674296. [PMID: 26525745 PMCID: PMC4615216 DOI: 10.1155/2015/674296] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2015] [Revised: 06/12/2015] [Accepted: 06/21/2015] [Indexed: 01/02/2023]
Abstract
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Collapse
|
31
|
Li J, Feng Y, Wang X, Li J, Liu W, Rong L, Bao J. An Overview of Predictors for Intrinsically Disordered Proteins over 2010-2014. Int J Mol Sci 2015; 16:23446-62. [PMID: 26426014 PMCID: PMC4632708 DOI: 10.3390/ijms161023446] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Revised: 08/25/2015] [Accepted: 08/31/2015] [Indexed: 02/05/2023] Open
Abstract
The sequence-structure-function paradigm of proteins has been changed by the occurrence of intrinsically disordered proteins (IDPs). Benefiting from the structural disorder, IDPs are of particular importance in biological processes like regulation and signaling. IDPs are associated with human diseases, including cancer, cardiovascular disease, neurodegenerative diseases, amyloidoses, and several other maladies. IDPs attract a high level of interest and a substantial effort has been made to develop experimental and computational methods. So far, more than 70 prediction tools have been developed since 1997, within which 17 predictors were created in the last five years. Here, we presented an overview of IDPs predictors developed during 2010-2014. We analyzed the algorithms used for IDPs prediction by these tools and we also discussed the basic concept of various prediction methods for IDPs. The comparison of prediction performance among these tools is discussed as well.
Collapse
Affiliation(s)
- Jianzong Li
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
| | - Yu Feng
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
| | - Xiaoyun Wang
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
| | - Jing Li
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
- State Key Laboratory of Biotherapy/Collaborative Innovation Center for Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.
| | - Wen Liu
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
| | - Li Rong
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
| | - Jinku Bao
- College of Life Sciences & Key Laboratory of Ministry of Education for Bio-Resources and Bio-Environment, Sichuan University, Chengdu 610064, China.
- State Key Laboratory of Biotherapy/Collaborative Innovation Center for Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.
- State Key Laboratory of Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
32
|
Zisaki A, Miskovic L, Hatzimanikatis V. Antihypertensive drugs metabolism: an update to pharmacokinetic profiles and computational approaches. Curr Pharm Des 2015; 21:806-22. [PMID: 25341854 PMCID: PMC4435036 DOI: 10.2174/1381612820666141024151119] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 10/09/2014] [Indexed: 02/07/2023]
Abstract
Drug discovery and development is a high-risk enterprise that requires significant investments in capital, time and scientific expertise. The studies of xenobiotic metabolism remain as one of the main topics in the research and development of drugs, cosmetics and nutritional supplements. Antihypertensive drugs are used for the treatment of high blood pressure, which is one the most frequent symptoms of the patients that undergo cardiovascular diseases such as myocardial infraction and strokes. In current cardiovascular disease pharmacology, four drug clusters - Angiotensin Converting Enzyme Inhibitors, Beta-Blockers, Calcium Channel Blockers and Diuretics - cover the major therapeutic characteristics of the most antihypertensive drugs. The pharmacokinetic and specifically the metabolic profile of the antihypertensive agents are intensively studied because of the broad inter-individual variability on plasma concentrations and the diversity on the efficacy response especially due to the P450 dependent metabolic status they present. Several computational methods have been developed with the aim to: (i) model and better understand the human drug metabolism; and (ii) enhance the experimental investigation of the metabolism of small xenobiotic molecules. The main predictive tools these methods employ are rule-based approaches, quantitative structure metabolism/activity relationships and docking approaches. This review paper provides detailed metabolic profiles of the major clusters of antihypertensive agents, including their metabolites and their metabolizing enzymes, and it also provides specific information concerning the computational approaches that have been used to predict the metabolic profile of several antihypertensive drugs.
Collapse
Affiliation(s)
| | | | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biotechnology (LCSB), Ecole Polytechnique Federale de Lausanne, EPFL/SB/ISIC/LCSB, CH H4 624/ Station 6/ CH-1015 Lausanne/ Switzerland.
| |
Collapse
|
33
|
Awad W, Adamczyk B, Örnros J, Karlsson NG, Mani K, Logan DT. Structural Aspects of N-Glycosylations and the C-terminal Region in Human Glypican-1. J Biol Chem 2015; 290:22991-3008. [PMID: 26203194 PMCID: PMC4645609 DOI: 10.1074/jbc.m115.660878] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Revised: 07/10/2015] [Indexed: 11/06/2022] Open
Abstract
Glypicans are multifunctional cell surface proteoglycans involved in several important cellular signaling pathways. Glypican-1 (Gpc1) is the predominant heparan sulfate proteoglycan in the developing and adult human brain. The two N-linked glycans and the C-terminal domain that attach the core protein to the cell membrane are not resolved in the Gpc1 crystal structure. Therefore, we have studied Gpc1 using crystallography, small angle x-ray scattering, and chromatographic approaches to elucidate the composition, structure, and function of the N-glycans and the C terminus and also the topology of Gpc1 with respect to the membrane. The C terminus is shown to be highly flexible in solution, but it orients the core protein transverse to the membrane, directing a surface evolutionarily conserved in Gpc1 orthologs toward the membrane, where it may interact with signaling molecules and/or membrane receptors on the cell surface, or even the enzymes involved in heparan sulfate substitution in the Golgi apparatus. Furthermore, the N-glycans are shown to extend the protein stability and lifetime by protection against proteolysis and aggregation.
Collapse
Affiliation(s)
- Wael Awad
- From the Department of Biochemistry and Structural Biology, Centre for Molecular Protein Science, Lund University, Box 124, SE-221 00 Lund
| | - Barbara Adamczyk
- the Department of Biochemistry and Cell Biology, University of Gothenburg, Box 440, SE-40530 Gothenburg, and
| | - Jessica Örnros
- the Department of Biochemistry and Cell Biology, University of Gothenburg, Box 440, SE-40530 Gothenburg, and
| | - Niclas G Karlsson
- the Department of Biochemistry and Cell Biology, University of Gothenburg, Box 440, SE-40530 Gothenburg, and
| | - Katrin Mani
- the Department of Experimental Medical Science, Lund University, SE-221 84, Lund, Sweden
| | - Derek T Logan
- From the Department of Biochemistry and Structural Biology, Centre for Molecular Protein Science, Lund University, Box 124, SE-221 00 Lund,
| |
Collapse
|
34
|
Tian YP, Valkonen JPT. Recombination of strain O segments to HCpro-encoding sequence of strain N of Potato virus Y modulates necrosis induced in tobacco and in potatoes carrying resistance genes Ny or Nc. MOLECULAR PLANT PATHOLOGY 2015; 16:735-47. [PMID: 25557768 PMCID: PMC6638495 DOI: 10.1111/mpp.12231] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Hypersensitive resistance (HR) to strains O and C of Potato virus Y (PVY, genus Potyvirus) is conferred by potato genes Ny(tbr) and Nc(tbr), respectively; however, PVY N strains overcome these resistance genes. The viral helper component proteinases (HCpro, 456 amino acids) from PVY(N) and PVY(O) are distinguished by an eight-amino-acid signature sequence, causing HCpro to fold into alternative conformations. Substitution of only two residues (K269R and R270K) of the eight-amino-acid signature in PVY(N) HCpro was needed to convert the three-dimensional (3D) model of PVY(N) HCpro to a PVY(O) -like conformation and render PVY(N) avirulent in the presence of Ny(tbr), whereas four amino acid substitutions were necessary to change PVY(O) HCpro to a PVY(N) -like conformation. Hence, the HCpro conformation rather than other features ascribed to the sequence were essential for recognition by Ny(tbr). The 3D model of PVY(C) HCpro closely resembled PVY(O), but differed from PVY(N) HCpro. HCpro of all strains was structurally similar to β-catenin. Sixteen PVY(N) 605-based chimeras were inoculated to potato cv. Pentland Crown (Ny(tbr)), King Edward (Nc(tbr)) and Pentland Ivory (Ny(tbr)/Nc(tbr)). Eleven chimeras induced necrotic local lesions and caused no systemic infection, and thus differed from both parental viruses that infected King Edward systemically, and from PVY(N) 605 that infected Pentland Crown and Pentland Ivory systemically. These 11 chimeras triggered both Ny(tbr) and Nc(tbr) and, in addition, six induced veinal necrosis in tobacco. Further, specific amino acid residues were found to have an additive impact on necrosis. These results shed new light on the causes of PVY-related necrotic symptoms in potato.
Collapse
Affiliation(s)
- Yan-Ping Tian
- Department of Agricultural Sciences, University of Helsinki, PO Box 27, FI-00014, Helsinki, Finland
| | - Jari P T Valkonen
- Department of Agricultural Sciences, University of Helsinki, PO Box 27, FI-00014, Helsinki, Finland
| |
Collapse
|
35
|
Volpato V, Alshomrani B, Pollastri G. Accurate Ab Initio and Template-Based Prediction of Short Intrinsically-Disordered Regions by Bidirectional Recurrent Neural Networks Trained on Large-Scale Datasets. Int J Mol Sci 2015; 16:19868-85. [PMID: 26307973 PMCID: PMC4581330 DOI: 10.3390/ijms160819868] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/28/2015] [Accepted: 07/29/2015] [Indexed: 12/02/2022] Open
Abstract
Intrinsically-disordered regions lack a well-defined 3D structure, but play key roles in determining the function of many proteins. Although predictors of disorder have been shown to achieve relatively high rates of correct classification of these segments, improvements over the the years have been slow, and accurate methods are needed that are capable of accommodating the ever-increasing amount of structurally-determined protein sequences to try to boost predictive performances. In this paper, we propose a predictor for short disordered regions based on bidirectional recurrent neural networks and tested by rigorous five-fold cross-validation on a large, non-redundant dataset collected from MobiDB, a new comprehensive source of protein disorder annotations. The system exploits sequence and structural information in the forms of frequency profiles, predicted secondary structure and solvent accessibility and direct disorder annotations from homologous protein structures (templates) deposited in the Protein Data Bank. The contributions of sequence, structure and homology information result in large improvements in predictive accuracy. Additionally, the large scale of the training set leads to low false positive rates, making our systems a robust and efficient way to address high-throughput disorder prediction.
Collapse
Affiliation(s)
- Viola Volpato
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
- Adaptive and Complex Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Badr Alshomrani
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
- Adaptive and Complex Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
- Adaptive and Complex Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
36
|
Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies. Int J Mol Sci 2015; 16:19040-54. [PMID: 26287166 PMCID: PMC4581285 DOI: 10.3390/ijms160819040] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 07/15/2015] [Accepted: 08/04/2015] [Indexed: 12/13/2022] Open
Abstract
The role and function of a given protein is dependent on its structure. In recent years, however, numerous studies have highlighted the importance of unstructured, or disordered regions in governing a protein’s function. Disordered proteins have been found to play important roles in pivotal cellular functions, such as DNA binding and signalling cascades. Studying proteins with extended disordered regions is often problematic as they can be challenging to express, purify and crystallise. This means that interpretable experimental data on protein disorder is hard to generate. As a result, predictive computational tools have been developed with the aim of predicting the level and location of disorder within a protein. Currently, over 60 prediction servers exist, utilizing different methods for classifying disorder and different training sets. Here we review several good performing, publicly available prediction methods, comparing their application and discussing how disorder prediction servers can be used to aid the experimental solution of protein structure. The use of disorder prediction methods allows us to adopt a more targeted approach to experimental studies by accurately identifying the boundaries of ordered protein domains so that they may be investigated separately, thereby increasing the likelihood of their successful experimental solution.
Collapse
|
37
|
DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields. Int J Mol Sci 2015; 16:17315-30. [PMID: 26230689 PMCID: PMC4581195 DOI: 10.3390/ijms160817315] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 07/15/2015] [Accepted: 07/16/2015] [Indexed: 12/14/2022] Open
Abstract
Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.
Collapse
|
38
|
Goda N, Shimizu K, Kuwahara Y, Tenno T, Noguchi T, Ikegami T, Ota M, Hiroaki H. A Method for Systematic Assessment of Intrinsically Disordered Protein Regions by NMR. Int J Mol Sci 2015; 16:15743-60. [PMID: 26184172 PMCID: PMC4519922 DOI: 10.3390/ijms160715743] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 06/17/2015] [Accepted: 07/01/2015] [Indexed: 11/16/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) that lack stable conformations and are highly flexible have attracted the attention of biologists. Therefore, the development of a systematic method to identify polypeptide regions that are unstructured in solution is important. We have designed an "indirect/reflected" detection system for evaluating the physicochemical properties of IDPs using nuclear magnetic resonance (NMR). This approach employs a "chimeric membrane protein"-based method using the thermostable membrane protein PH0471. This protein contains two domains, a transmembrane helical region and a C-terminal OB (oligonucleotide/oligosaccharide binding)-fold domain (named NfeDC domain), connected by a flexible linker. NMR signals of the OB-fold domain of detergent-solubilized PH0471 are observed because of the flexibility of the linker region. In this study, the linker region was substituted with target IDPs. Fifty-three candidates were selected using the prediction tool POODLE and 35 expression vectors were constructed. Subsequently, we obtained 15N-labeled chimeric PH0471 proteins with 25 IDPs as linkers. The NMR spectra allowed us to classify IDPs into three categories: flexible, moderately flexible, and inflexible. The inflexible IDPs contain membrane-associating or aggregation-prone sequences. This is the first attempt to use an indirect/reflected NMR method to evaluate IDPs and can verify the predictions derived from our computational tools.
Collapse
Affiliation(s)
- Natsuko Goda
- Division of Structural Biology, Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| | - Kana Shimizu
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building 2-4-7 Aomi, Koto-ku, Tokyo 135-0046, Japan.
| | - Yohta Kuwahara
- Division of Structural Biology, Graduate School of Medicine, Kobe University, Kusunoki-cho, 7-5-1, Chuo-ku, Kobe 650-0017, Japan.
| | - Takeshi Tenno
- The Structural Biology Research Center and Division of Biological Science, Graduate School of Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| | - Tamotsu Noguchi
- Pharmaceutical Education Research Center, Meiji Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-8588, Japan.
| | - Takahisa Ikegami
- Institute for Protein Research, Osaka University, Yamadaoka 3-2, Suita, Osaka 565-0871, Japan.
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan.
| | - Motonori Ota
- Graduate School of Information Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan.
| | - Hidekazu Hiroaki
- Division of Structural Biology, Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
- Division of Structural Biology, Graduate School of Medicine, Kobe University, Kusunoki-cho, 7-5-1, Chuo-ku, Kobe 650-0017, Japan.
- The Structural Biology Research Center and Division of Biological Science, Graduate School of Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan.
| |
Collapse
|
39
|
Wang Z, Yang Q, Li T, Cong P. DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach. PLoS One 2015; 10:e0128334. [PMID: 26090958 PMCID: PMC4474717 DOI: 10.1371/journal.pone.0128334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2014] [Accepted: 04/26/2015] [Indexed: 11/21/2022] Open
Abstract
The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database.
Collapse
Affiliation(s)
- Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Qianqian Yang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (T-HL); (P-SC)
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (T-HL); (P-SC)
| |
Collapse
|
40
|
Haynes CLF, Ameloot P, Remaut H, Callewaert N, Sterckx YGJ, Magez S. Production, purification and crystallization of a trans-sialidase from Trypanosoma vivax. Acta Crystallogr F Struct Biol Commun 2015; 71:577-85. [PMID: 25945712 PMCID: PMC4427168 DOI: 10.1107/s2053230x15002496] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Accepted: 02/05/2015] [Indexed: 11/10/2022] Open
Abstract
Sialidases and trans-sialidases play important roles in the life cycles of various microorganisms. These enzymes can serve nutritional purposes, act as virulence factors or mediate cellular interactions (cell evasion and invasion). In the case of the protozoan parasite Trypanosoma vivax, trans-sialidase activity has been suggested to be involved in infection-associated anaemia, which is the major pathology in the disease nagana. The physiological role of trypanosomal trans-sialidases in host-parasite interaction as well as their structures remain obscure. Here, the production, purification and crystallization of a recombinant version of T. vivax trans-sialidase 1 (rTvTS1) are described. The obtained rTvTS1 crystals diffracted to a resolution of 2.5 Å and belonged to the orthorhombic space group P212121, with unit-cell parameters a = 57.3, b = 78.4, c = 209.0 Å.
Collapse
Affiliation(s)
- Carole L. F. Haynes
- Structural Biology Research Center (SBRC), VIB, Pleinlaan 2, B-1050 Brussels, Belgium
- Research Unit for Cellular and Molecular Immunology (CMIM), VUB, Pleinlaan 2, B-1050 Brussels, Belgium
- Department for Molecular Biomedical Research (DMBR), UGent, Ghent, Belgium
| | - Paul Ameloot
- Department for Molecular Biomedical Research (DMBR), UGent, Ghent, Belgium
| | - Han Remaut
- Structural Biology Research Center (SBRC), VIB, Pleinlaan 2, B-1050 Brussels, Belgium
- Structural and Molecular Microbiology (SMM), VUB, Pleinlaan 2, B-1050 Brussels, Belgium
| | - Nico Callewaert
- Department for Molecular Biomedical Research (DMBR), UGent, Ghent, Belgium
| | - Yann G.-J. Sterckx
- Structural Biology Research Center (SBRC), VIB, Pleinlaan 2, B-1050 Brussels, Belgium
- Research Unit for Cellular and Molecular Immunology (CMIM), VUB, Pleinlaan 2, B-1050 Brussels, Belgium
| | - Stefan Magez
- Structural Biology Research Center (SBRC), VIB, Pleinlaan 2, B-1050 Brussels, Belgium
- Research Unit for Cellular and Molecular Immunology (CMIM), VUB, Pleinlaan 2, B-1050 Brussels, Belgium
| |
Collapse
|
41
|
Banroques J, Tanner NK. Bioinformatics and biochemical methods to study the structural and functional elements of DEAD-box RNA helicases. Methods Mol Biol 2015; 1259:165-181. [PMID: 25579586 DOI: 10.1007/978-1-4939-2214-7_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
DEAD-box RNA helicases have core structures consisting of two, tandemly linked, RecA-like domains that contain all of the conserved motifs involved in binding ATP and RNA, and that are needed for the enzymatic activities. The conserved sequence motifs and structural homology indicate that these proteins share common origins and underlining functionality. Indeed, the purified proteins generally act as ATP-dependent RNA-binding proteins and RNA-dependent ATPases in vitro, but for the most part without the substrate specificity or enzymatic regulation that exists in the cell. We are interested in understanding the relationships between the conserved motifs and structures that confer the commonly shared features, and we are interested in understanding how modifications of the core structure alter the enzymatic properties. We use sequence alignments and structural modeling to reveal regions of interest, which we modify by classical molecular biological techniques (mutations and deletions). We then use various biochemical techniques to characterize the purified proteins and their variants for their ATPase, RNA binding, and RNA unwinding activities to determine the functional roles of the different elements. In this chapter, we describe the methods we use to design our constructs and to determine their enzymatic activities in vitro.
Collapse
Affiliation(s)
- Josette Banroques
- Institut de Biologie Physico-chimique, CNRS FRE3630, Sorbonne Paris Cité, 13 rue Pierre et Marie Curie, Paris, 75005, France
| | | |
Collapse
|
42
|
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) do not adopt a well-defined folded structure under physiological conditions. Instead, these proteins exist as heterogeneous and dynamical conformational ensembles. IDPs are widespread in eukaryotic proteomes and are involved in fundamental biological processes, mostly related to regulation and signaling. At the same time, disordered regions often pose significant challenges to the structure determination process, which generally requires highly homogeneous proteins samples. In this book chapter, we provide a brief overview of protein disorder, describe various bioinformatics resources that have been developed in recent years for their characterization, and give a general outline of their applications in various types of structural genomics projects. Traditionally, disordered segments were filtered out to optimize the yield of structure determination pipelines. However, it is becoming increasingly clear that the structural characterization of proteins cannot be complete without the incorporation of intrinsically disordered regions.
Collapse
Affiliation(s)
- Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | |
Collapse
|
43
|
Wright KE, Hjerrild KA, Bartlett J, Douglas AD, Jin J, Brown RE, Illingworth JJ, Ashfield R, Clemmensen SB, de Jongh WA, Draper SJ, Higgins MK. Structure of malaria invasion protein RH5 with erythrocyte basigin and blocking antibodies. Nature 2014; 515:427-30. [PMID: 25132548 PMCID: PMC4240730 DOI: 10.1038/nature13715] [Citation(s) in RCA: 169] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Accepted: 07/28/2014] [Indexed: 12/12/2022]
Abstract
Invasion of host erythrocytes is essential to the life cycle of Plasmodium parasites and development of the pathology of malaria. The stages of erythrocyte invasion, including initial contact, apical reorientation, junction formation, and active invagination, are directed by coordinated release of specialized apical organelles and their parasite protein contents. Among these proteins, and central to invasion by all species, are two parasite protein families, the reticulocyte-binding protein homologue (RH) and erythrocyte-binding like proteins, which mediate host-parasite interactions. RH5 from Plasmodium falciparum (PfRH5) is the only member of either family demonstrated to be necessary for erythrocyte invasion in all tested strains, through its interaction with the erythrocyte surface protein basigin (also known as CD147 and EMMPRIN). Antibodies targeting PfRH5 or basigin efficiently block parasite invasion in vitro, making PfRH5 an excellent vaccine candidate. Here we present crystal structures of PfRH5 in complex with basigin and two distinct inhibitory antibodies. PfRH5 adopts a novel fold in which two three-helical bundles come together in a kite-like architecture, presenting binding sites for basigin and inhibitory antibodies at one tip. This provides the first structural insight into erythrocyte binding by the Plasmodium RH protein family and identifies novel inhibitory epitopes to guide design of a new generation of vaccines against the blood-stage parasite.
Collapse
Affiliation(s)
- Katherine E Wright
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK
| | - Kathryn A Hjerrild
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Jonathan Bartlett
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK
| | - Alexander D Douglas
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Jing Jin
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Rebecca E Brown
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Joseph J Illingworth
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Rebecca Ashfield
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Stine B Clemmensen
- ExpreS2ion Biotechnologies, SCION-DTU Science Park, Agern Allé 1, DK-2970 Horsholm, Denmark
| | - Willem A de Jongh
- ExpreS2ion Biotechnologies, SCION-DTU Science Park, Agern Allé 1, DK-2970 Horsholm, Denmark
| | - Simon J Draper
- Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK
| | - Matthew K Higgins
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK
| |
Collapse
|
44
|
Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. ACTA ACUST UNITED AC 2014; 31:857-63. [PMID: 25391399 PMCID: PMC4380029 DOI: 10.1093/bioinformatics/btu744] [Citation(s) in RCA: 655] [Impact Index Per Article: 59.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Motivation: A sizeable fraction of eukaryotic proteins contain intrinsically disordered regions (IDRs), which act in unfolded states or by undergoing transitions between structured and unstructured conformations. Over time, sequence-based classifiers of IDRs have become fairly accurate and currently a major challenge is linking IDRs to their biological roles from the molecular to the systems level. Results: We describe DISOPRED3, which extends its predecessor with new modules to predict IDRs and protein-binding sites within them. Based on recent CASP evaluation results, DISOPRED3 can be regarded as state of the art in the identification of IDRs, and our self-assessment shows that it significantly improves over DISOPRED2 because its predictions are more specific across the whole board and more sensitive to IDRs longer than 20 amino acids. Predicted IDRs are annotated as protein binding through a novel SVM based classifier, which uses profile data and additional sequence-derived features. Based on benchmarking experiments with full cross-validation, we show that this predictor generates precise assignments of disordered protein binding regions and that it compares well with other publicly available tools. Availability and implementation:http://bioinf.cs.ucl.ac.uk/disopred Contact:d.t.jones@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| | - Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
45
|
Hashi Y, Kawai G, Kotani S. Microtubule-associated protein (MAP) 4 interacts with microtubules in an intrinsically disordered manner. Biosci Biotechnol Biochem 2014; 78:1864-70. [DOI: 10.1080/09168451.2014.940836] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Abstract
We previously used nuclear magnetic resonance (NMR) to analyze the structure of a synthetic tricosapeptide corresponding to an active site of microtubule-associated protein 4 (MAP4). To further the structural analysis, we have constructed a minimal active domain fragment of MAP4, encompassing the entire active site, and obtained its NMR spectra. The secondary structure prediction using partially assigned NMR data suggested that the fragment is largely unfolded. Two other independent techniques also demonstrated its unfolded nature, indicating that MAP4 belongs to the class of intrinsically disordered proteins (IDPs). The NMR spectra of the fragment-microtubule mixture revealed that the fragment binds to the microtubule using multiple binding sites, apparently contradicting our previous quantitative studies. Given that MAP4 is intrinsically disordered, we propose a mechanism in which any one of the binding sites is active at a time, which is one of the typical interaction mechanisms proposed for IDPs.
Collapse
Affiliation(s)
- Yurika Hashi
- Faculty of Science, Department of Biological Sciences, Kanagawa University, Hiratsuka, Japan
| | - Gota Kawai
- Faculty of Engineering, Department of Life and Environmental Sciences, Chiba Institute of Technology, Narashino, Japan
| | - Susumu Kotani
- Faculty of Science, Department of Biological Sciences, Kanagawa University, Hiratsuka, Japan
| |
Collapse
|
46
|
Walsh I, Giollo M, Di Domenico T, Ferrari C, Zimmermann O, Tosatto SCE. Comprehensive large-scale assessment of intrinsic protein disorder. ACTA ACUST UNITED AC 2014; 31:201-8. [PMID: 25246432 DOI: 10.1093/bioinformatics/btu625] [Citation(s) in RCA: 128] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. RESULTS MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. AVAILABILITY The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Manuel Giollo
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Tomás Di Domenico
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Carlo Ferrari
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Olav Zimmermann
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| |
Collapse
|
47
|
An overview of the sequence features of N- and C-terminal segments of the human chemokine receptors. Cytokine 2014; 70:141-50. [PMID: 25138014 DOI: 10.1016/j.cyto.2014.07.257] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Revised: 06/21/2014] [Accepted: 07/29/2014] [Indexed: 01/10/2023]
Abstract
Chemokine receptors play a crucial role in the cellular signaling enrolling extracellular ligands chemotactic proteins which recruit immune cells. They possess seven trans-membrane helices, an extracellular N-terminal region with three extracellular hydrophilic loops being important for search and recognition of specific ligand(s), and an intracellular C-terminal region with three intracellular loops that couple G-proteins. Although the functional aspects of the terminal segments of the extra-and intra-cellular G proteins are universally identified, the molecular basis on which they rest are still unclear because they are not definable by means of X-rays due to their high mobility and are not easy to study in the membrane. The purpose of this work is to define which physical-chemical properties of the terminal segments of the human chemokine receptors are at the basis of their functional mechanisms. Therefore, we have evaluated their physical-chemical properties in terms of amino acid composition, local flexibility, disorder propensity, net charge distribution and putative sites of post-translational modifications. Our results support the conclusion that all 19 C-terminal and N-terminal segments of human chemokine receptors are very flexible due to the systematic presence of intrinsic disorder. Although, the purpose of this plasticity clearly appears that of controlling and modulating the binding of ligands, we provide evidence that the overlap of linearly charged stretches, intrinsic disorder and post-translational modification sites, consistently found in these motives, is a necessary feature to exert the function. The role of the intrinsic disorder has been discussed considering the structural information coming from intrinsically disordered model compounds which support the view that the chemokine terminals have to be considered as strong polyampholytes or polyelectrolytes where conformational ensembles and structural transitions between them are modulated by charge fraction variations. Also the role of post-translational modifications has been found coherent with this view because, changing the charge fraction, they guide structural transitions between ensembles. Moreover, we have also considered our results from an evolutionary point of view in order to understand if the features found in humans were also present in other species. Our data evidenced that the structural features of the human terminals of the chemokine receptors were shared and evolutionarily conserved particularly among mammals. This means that the various organisms not only tolerate but select intrinsic disorder for the terminal regions of their receptors, reflecting constraints that point to molecular recognition. In conclusion the terminal segments of chemokine receptors must be considered as strong polyampholytes where the charge fraction variations induced by post-translational modifications are the driving physico-chemical feature able to adapt the conformations of the terminal segments to their functions.
Collapse
|
48
|
Li M, Cho SB, Ryu KH. A novel approach for predicting disordered regions in a protein sequence. Osong Public Health Res Perspect 2014; 5:211-8. [PMID: 25379372 PMCID: PMC4215001 DOI: 10.1016/j.phrp.2014.06.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 06/24/2014] [Accepted: 06/24/2014] [Indexed: 12/01/2022] Open
Abstract
OBJECTIVES A number of published predictors are based on various algorithms and disordered protein sequence properties. Although many predictors have been published, the study of protein disordered region prediction is ongoing because different prediction methods can find different disordered regions in a protein sequence. METHODS Therefore we have used a new approach to find the more varying disordered regions for more efficient and accurate prediction of protein structures. In this study, we propose a novel approach called "emerging subsequence (ES) mining" without using the characteristics of the disordered protein. We first adapted the approach to generate emerging protein subsequences on public protein sequence data. Second, the disordered and ordered regions in a protein sequence were predicted by searching the generated emerging protein subsequence with a sliding window, which tends to overlap. Third, the scores of the overlapping regions were calculated based on support and growthrate values in both classes. Finally, the score of predicted regions in the target class were compared with the score of the source class, and the class having a higher score was selected. RESULTS In this experiment, disordered sequence data and ordered sequence data was extracted from DisProt 6.02 and PDB respectively and used as training data. The test data come from CASP 9 and CASP 10 where disordered and ordered regions are known. CONCLUSION Comparing with several published predictors, the results of the experiment show higher accuracy rates than with other existing methods.
Collapse
Affiliation(s)
- Meijing Li
- Database/Bioinformatics Laboratory, Chungbuk National University, Cheongju, Korea
| | - Seong Beom Cho
- Division of Bio-Medical Informatics, Center for Genome Science, Korea National Institute of Health, Cheongju, Korea
| | - Keun Ho Ryu
- Database/Bioinformatics Laboratory, Chungbuk National University, Cheongju, Korea
| |
Collapse
|
49
|
Evidence supporting the existence of a NUPR1-like family of helix-loop-helix chromatin proteins related to, yet distinct from, AT hook-containing HMG proteins. J Mol Model 2014; 20:2357. [PMID: 25056123 PMCID: PMC4139591 DOI: 10.1007/s00894-014-2357-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 06/15/2014] [Indexed: 12/29/2022]
Abstract
NUPR1, a small chromatin protein, plays a critical role in cancer development, progression, and resistance to therapy. Here, using a combination of structural bioinformatics and molecular modeling methods, we report several novel findings that enhance our understanding of the biochemical function of this protein. We find that NUPR1 has been conserved throughout evolution, and over time it has undergone duplications and transpositions to form other transcriptional regulators. Using threading, homology-based molecular modeling, molecular mechanics calculations, and molecular dynamics simulations, we generated structural models for four of these proteins: NUPR1a, NUPR1b, NUPR2, and the NUPR-like domain of GTF2-I. Comparative analyses of these models combined with extensive linear motif identification reveal that these four proteins, though similar in their propensities for folding, differ in size, surface changes, and sites amenable for posttranslational modification. Lastly, taking NUPR1a as the paradigm for this family, we built models of a NUPR–DNA complex. Additional structural comparisons revealed that NUPR1 defines a new family of small-groove-binding proteins that share structural features with, yet are distinct from, helix-loop-helix AT-hook-containing HMG proteins. These models and inferences should lead to a better understanding of the function of this group of chromatin proteins, which play a critical role in the development of human malignant diseases.
Collapse
|
50
|
Chen YA, Murakami Y, Ahmad S, Yoshimaru T, Katagiri T, Mizuguchi K. Brefeldin A-inhibited guanine nucleotide-exchange protein 3 (BIG3) is predicted to interact with its partner through an ARM-type α-helical structure. BMC Res Notes 2014; 7:435. [PMID: 24997568 PMCID: PMC4096751 DOI: 10.1186/1756-0500-7-435] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 06/30/2014] [Indexed: 12/21/2022] Open
Abstract
Background Brefeldin A-inhibited guanine nucleotide-exchange protein 3 (BIG3) has been identified recently as a novel regulator of estrogen signalling in breast cancer cells. Despite being a potential target for new breast cancer treatment, its amino acid sequence suggests no association with any well-characterized protein family and provides little clues as to its molecular function. In this paper, we predicted the structure, function and interactions of BIG3 using a range of bioinformatic tools. Results Homology search results showed that BIG3 had distinct features from its paralogues, BIG1 and BIG2, with a unique region between the two shared domains, Sec7 and DUF1981. Although BIG3 contains Sec7 domain, the lack of the conserved motif and the critical glutamate residue suggested no potential guaninyl-exchange factor (GEF) activity. Fold recognition tools predicted BIG3 to adopt an α-helical repeat structure similar to that of the armadillo (ARM) family. Using state-of-the-art methods, we predicted interaction sites between BIG3 and its partner PHB2. Conclusions The combined results of the structure and interaction prediction led to a novel hypothesis that one of the predicted helices of BIG3 might play an important role in binding to PHB2 and thereby preventing its translocation to the nucleus. This hypothesis has been subsequently verified experimentally.
Collapse
Affiliation(s)
| | | | | | | | | | - Kenji Mizuguchi
- National Institute of Biomedical Innovation, 7-6-8 Saito-asagi, Ibaraki city, Osaka 567-0085, Japan.
| |
Collapse
|