51
|
Tang YJ, Pang YH, Liu B. DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network. Bioinformatics 2022; 38:1252-1260. [PMID: 34864847 DOI: 10.1093/bioinformatics/btab810] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 11/02/2021] [Accepted: 11/26/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. The IDRs are divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. Previous studies have shown that LDRs and SDRs have different proprieties. However, the existing computational methods fail to extract different features for LDRs and SDRs separately. As a result, they achieve unstable performance on datasets with different ratios of LDRs and SDRs. RESULTS In this study, a two-layer predictor was proposed called DeepIDP-2L. In the first layer, two kinds of attention-based models are used to extract different features for LDRs and SDRs, respectively. The hierarchical attention network is used to capture the distribution pattern features of LDRs, and convolutional attention network is used to capture the local correlation features of SDRs. The second layer of DeepIDP-2L maps the feature extracted in the first layer into a new feature space. Convolutional network and bidirectional long short term memory are used to capture the local and long-range information for predicting both SDRs and LDRs. Experimental results show that DeepIDP-2L can achieve more stable performance than other exiting predictors on independent test sets with different ratios of SDRs and LDRs. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the new predictor has been established at http://bliulab.net/DeepIDP-2L/. It is anticipated that DeepIDP-2L will become a very useful tool for identification of intrinsically disordered regions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
52
|
Rosa E Silva I, Binó L, Johnson CM, Rutherford TJ, Neuhaus D, Andreeva A, Čajánek L, van Breugel M. Molecular mechanisms underlying the role of the centriolar CEP164-TTBK2 complex in ciliopathies. Structure 2022; 30:114-128.e9. [PMID: 34499853 PMCID: PMC8752127 DOI: 10.1016/j.str.2021.08.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 07/19/2021] [Accepted: 08/17/2021] [Indexed: 02/06/2023]
Abstract
Cilia formation is essential for human life. One of the earliest events in the ciliogenesis program is the recruitment of tau-tubulin kinase 2 (TTBK2) by the centriole distal appendage component CEP164. Due to the lack of high-resolution structural information on this complex, it is unclear how it is affected in human ciliopathies such as nephronophthisis. Furthermore, it is poorly understood if binding to CEP164 influences TTBK2 activities. Here, we present a detailed biochemical, structural, and functional analysis of the CEP164-TTBK2 complex and demonstrate how it is compromised by two ciliopathic mutations in CEP164. Moreover, we also provide insights into how binding to CEP164 is coordinated with TTBK2 activities. Together, our data deepen our understanding of a crucial step in cilia formation and will inform future studies aimed at restoring CEP164 functionality in a debilitating human ciliopathy.
Collapse
Affiliation(s)
- Ivan Rosa E Silva
- Queen Mary University of London, School of Biological and Chemical Sciences, 2 Newark Street, London E1 2AT, UK; Medical Research Council - Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK.
| | - Lucia Binó
- Department of Histology and Embryology, Faculty of Medicine, Masaryk University, Kamenice 5, Brno 62500, Czech Republic
| | - Christopher M Johnson
- Medical Research Council - Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Trevor J Rutherford
- Medical Research Council - Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - David Neuhaus
- Medical Research Council - Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Antonina Andreeva
- Medical Research Council - Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Lukáš Čajánek
- Department of Histology and Embryology, Faculty of Medicine, Masaryk University, Kamenice 5, Brno 62500, Czech Republic
| | - Mark van Breugel
- Queen Mary University of London, School of Biological and Chemical Sciences, 2 Newark Street, London E1 2AT, UK; Medical Research Council - Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK.
| |
Collapse
|
53
|
Tamburrini KC, Pesce G, Nilsson J, Gondelaud F, Kajava AV, Berrin JG, Longhi S. Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol 2022; 2449:95-147. [PMID: 35507260 DOI: 10.1007/978-1-0716-2095-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the last two decades it has become increasingly evident that a large number of proteins adopt either a fully or a partially disordered conformation. Intrinsically disordered proteins are ubiquitous proteins that fulfill essential biological functions while lacking a stable 3D structure. Their conformational heterogeneity is encoded by the amino acid sequence, thereby allowing intrinsically disordered proteins or regions to be recognized based on their sequence properties. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting protein disorder and identifying intrinsically disordered binding sites.
Collapse
Affiliation(s)
- Ketty C Tamburrini
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Giulia Pesce
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Juliet Nilsson
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Frank Gondelaud
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Université Montpellier, Montpellier, France
| | - Jean-Guy Berrin
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Sonia Longhi
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France.
| |
Collapse
|
54
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
55
|
Katuwawala A, Zhao B, Kurgan L. DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning. Bioinformatics 2021; 38:115-124. [PMID: 34487138 DOI: 10.1093/bioinformatics/btab640] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/05/2021] [Accepted: 09/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs). RESULTS DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred's predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods. AVAILABILITY AND IMPLEMENTATION DisoLipPred's webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
56
|
Payliss BJ, Patel A, Sheppard AC, Wyatt HDM. Exploring the Structures and Functions of Macromolecular SLX4-Nuclease Complexes in Genome Stability. Front Genet 2021; 12:784167. [PMID: 34804132 PMCID: PMC8599992 DOI: 10.3389/fgene.2021.784167] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 10/21/2021] [Indexed: 12/15/2022] Open
Abstract
All organisms depend on the ability of cells to accurately duplicate and segregate DNA into progeny. However, DNA is frequently damaged by factors in the environment and from within cells. One of the most dangerous lesions is a DNA double-strand break. Unrepaired breaks are a major driving force for genome instability. Cells contain sophisticated DNA repair networks to counteract the harmful effects of genotoxic agents, thus safeguarding genome integrity. Homologous recombination is a high-fidelity, template-dependent DNA repair pathway essential for the accurate repair of DNA nicks, gaps and double-strand breaks. Accurate homologous recombination depends on the ability of cells to remove branched DNA structures that form during repair, which is achieved through the opposing actions of helicases and structure-selective endonucleases. This review focuses on a structure-selective endonuclease called SLX1-SLX4 and the macromolecular endonuclease complexes that assemble on the SLX4 scaffold. First, we discuss recent developments that illuminate the structure and biochemical properties of this somewhat atypical structure-selective endonuclease. We then summarize the multifaceted roles that are fulfilled by human SLX1-SLX4 and its associated endonucleases in homologous recombination and genome stability. Finally, we discuss recent work on SLX4-binding proteins that may represent integral components of these macromolecular nuclease complexes, emphasizing the structure and function of a protein called SLX4IP.
Collapse
Affiliation(s)
- Brandon J Payliss
- Department of Biochemistry, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Ayushi Patel
- Department of Biochemistry, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Anneka C Sheppard
- Department of Biochemistry, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Haley D M Wyatt
- Department of Biochemistry, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.,Canada Research Chairs Program, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
57
|
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, Kurgan L. flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun 2021; 12:4438. [PMID: 34290238 PMCID: PMC8295265 DOI: 10.1038/s41467-021-24773-7] [Citation(s) in RCA: 182] [Impact Index Per Article: 45.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/06/2021] [Indexed: 01/05/2023] Open
Abstract
Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn's webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/.
Collapse
Affiliation(s)
- Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
58
|
Coates HW, Capell-Hattam IM, Brown AJ. The mammalian cholesterol synthesis enzyme squalene monooxygenase is proteasomally truncated to a constitutively active form. J Biol Chem 2021; 296:100731. [PMID: 33933449 PMCID: PMC8166775 DOI: 10.1016/j.jbc.2021.100731] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 04/24/2021] [Accepted: 04/28/2021] [Indexed: 02/06/2023] Open
Abstract
Squalene monooxygenase (SM, also known as squalene epoxidase) is a rate-limiting enzyme of cholesterol synthesis that converts squalene to monooxidosqualene and is oncogenic in numerous cancer types. SM is subject to feedback regulation via cholesterol-induced proteasomal degradation, which depends on its lipid-sensing N-terminal regulatory domain. We previously identified an endogenous truncated form of SM with a similar abundance to full-length SM, but whether this truncated form is functional or subject to the same regulatory mechanisms as full-length SM is not known. Here, we show that truncated SM differs from full-length SM in two major ways: it is cholesterol resistant and adopts a peripheral rather than integral association with the endoplasmic reticulum membrane. However, truncated SM retains full SM activity and is therefore constitutively active. Truncation of SM occurs during its endoplasmic reticulum–associated degradation and requires the proteasome, which partially degrades the SM N-terminus and disrupts cholesterol-sensing elements within the regulatory domain. Furthermore, truncation relies on a ubiquitin signal that is distinct from that required for cholesterol-induced degradation. Using mutagenesis, we demonstrate that partial proteasomal degradation of SM depends on both an intrinsically disordered region near the truncation site and the stability of the adjacent catalytic domain, which escapes degradation. These findings uncover an additional layer of complexity in the post-translational regulation of cholesterol synthesis and establish SM as the first eukaryotic enzyme found to undergo proteasomal truncation.
Collapse
Affiliation(s)
- Hudson W Coates
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, Australia
| | | | - Andrew J Brown
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, Australia.
| |
Collapse
|
59
|
Using a low correlation high orthogonality feature set and machine learning methods to identify plant pentatricopeptide repeat coding gene/protein. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.02.079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
60
|
Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 2021; 36:5177-5186. [PMID: 32702119 DOI: 10.1093/bioinformatics/btaa667] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/21/2020] [Accepted: 07/17/2020] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization. RESULTS In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
61
|
Ying X, Leier A, Marquez-Lago TT, Xie J, Jimeno Yepes AJ, Whisstock JC, Wilson C, Song J. Prediction of secondary structure population and intrinsic disorder of proteins using multitask deep learning. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:1325-1334. [PMID: 33936509 PMCID: PMC8075420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent research in predicting protein secondary structure populations (SSP) based on Nuclear Magnetic Resonance (NMR) chemical shifts has helped quantitatively characterise the structural conformational properties of intrinsically disordered proteins and regions (IDP/IDR). Different from protein secondary structure (SS) prediction, the SSP prediction assumes a dynamic assignment of secondary structures that seem correlate with disordered states. In this study, we designed a single-task deep learning framework to predict IDP/IDR and SSP respectively; and multitask deep learning frameworks to allow quantitative predictions of IDP/IDR evidenced by the simultaneously predicted SSP. According to independent test results, single-task deep learning models improve the prediction performance of shallow models for SSP and IDP/IDR. Also, the prediction performance was further improved for IDP/IDR prediction when SSP prediction was simultaneously predicted in multitask models. With p53 as a use case, we demonstrate how predicted SSP is used to explain the IDP/IDR predictions for each functional region.
Collapse
Affiliation(s)
- Xu Ying
- IBM Research Australia, Melbourne, Victoria, Australia
| | - Andre Leier
- University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Jue Xie
- Monash University, Melbourne, Victoria, Australia
| | | | | | | | | |
Collapse
|
62
|
Abstract
In recent biomedical studies, multidimensional profiling, which collects proteomics as well as other types of omics data on the same subjects, is getting increasingly popular. Proteomics, transcriptomics, genomics, epigenomics, and other types of data contain overlapping as well as independent information, which suggests the possibility of integrating multiple types of data to generate more reliable findings/models with better classification/prediction performance. In this chapter, a selective review is conducted on recent data integration techniques for both unsupervised and supervised analysis. The main objective is to provide the "big picture" of data integration that involves proteomics data and discuss the "intuition" beneath the recently developed approaches without invoking too many mathematical details. Potential pitfalls and possible directions for future developments are also discussed.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Yu Jiang
- School of Public Health, University of Memphis, Memphis, TN, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, USA.
| |
Collapse
|
63
|
Katuwawala A, Kurgan L. Comparative Assessment of Intrinsic Disorder Predictions with a Focus on Protein and Nucleic Acid-Binding Proteins. Biomolecules 2020; 10:E1636. [PMID: 33291838 PMCID: PMC7762010 DOI: 10.3390/biom10121636] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 01/18/2023] Open
Abstract
With over 60 disorder predictors, users need help navigating the predictor selection task. We review 28 surveys of disorder predictors, showing that only 11 include assessment of predictive performance. We identify and address a few drawbacks of these past surveys. To this end, we release a novel benchmark dataset with reduced similarity to the training sets of the considered predictors. We use this dataset to perform a first-of-its-kind comparative analysis that targets two large functional families of disordered proteins that interact with proteins and with nucleic acids. We show that limiting sequence similarity between the benchmark and the training datasets has a substantial impact on predictive performance. We also demonstrate that predictive quality is sensitive to the use of the well-annotated order and inclusion of the fully structured proteins in the benchmark datasets, both of which should be considered in future assessments. We identify three predictors that provide favorable results using the new benchmark set. While we find that VSL2B offers the most accurate and robust results overall, ESpritz-DisProt and SPOT-Disorder perform particularly well for disordered proteins. Moreover, we find that predictions for the disordered protein-binding proteins suffer low predictive quality compared to generic disordered proteins and the disordered nucleic acids-binding proteins. This can be explained by the high disorder content of the disordered protein-binding proteins, which makes it difficult for the current methods to accurately identify ordered regions in these proteins. This finding motivates the development of a new generation of methods that would target these difficult-to-predict disordered proteins. We also discuss resources that support users in collecting and identifying high-quality disorder predictions.
Collapse
Affiliation(s)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| |
Collapse
|
64
|
Banerjee S, Majumder K, Gutierrez GJ, Gupta D, Mittal B. Immuno-informatics approach for multi-epitope vaccine designing against SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32743567 PMCID: PMC7386484 DOI: 10.1101/2020.07.23.218529] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The novel Corona Virus Disease 2019 (COVID-19) pandemic has set the fatality rates ablaze across the world. So, to combat this disease, we have designed a multi-epitope vaccine from various proteins of Severe Acute Respiratory Syndrome Corona virus 2 (SARS-CoV-2) with an immuno-informatics approach, validated in silico to be stable, non-allergic and antigenic. Cytotoxic T-cell, helper T-cell, and B-cell epitopes were computationally predicted from six conserved protein sequences among four viral strains isolated across the world. The T-cell epitopes, overlapping with the B-cell epitopes, were included in the vaccine construct to assure the humoral and cell-mediated immune response. The beta-subunit of cholera toxin was added as an adjuvant at the N-terminal of the construct to increase immunogenicity. Interferon-gamma inducing epitopes were even predicted in the vaccine. Molecular docking and binding energetics studies revealed strong interactions of the vaccine with immune-stimulatory toll-like receptors (TLR) −2, 3, 4. Molecular dynamics simulation of the vaccine ensured in vivo stability in the biological system. The immune simulation of vaccine evinced elevated immune response. The efficient translation of the vaccine in an expression vector was assured utilizing in silico cloning approach. Certainly, such a vaccine construct could reliably be effective against COVID-19.
Collapse
Affiliation(s)
- Souvik Banerjee
- Department of Microbiology, St. Xavier's College (Autonomous), Kolkata
| | - Kaustav Majumder
- Department of Biosciences and Bioengineering, Indian Institute of Technology, Bombay
| | | | - Debkishore Gupta
- Department of Clinical Microbiology and Infection Control, The Calcutta Medical Research Institute and BM Birla Heart Research Centre, Kolkata
| | | |
Collapse
|
65
|
Marconi G, Aiello D, Kindiger B, Storchi L, Marrone A, Reale L, Terzaroli N, Albertini E. The Role of APOSTART in Switching between Sexuality and Apomixis in Poa pratensis. Genes (Basel) 2020; 11:genes11080941. [PMID: 32824095 PMCID: PMC7464379 DOI: 10.3390/genes11080941] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 08/11/2020] [Accepted: 08/11/2020] [Indexed: 12/20/2022] Open
Abstract
The production of seeds without sex is considered the holy grail of plant biology. The transfer of apomixis to various crop species has the potential to transform plant breeding, since it will allow new varieties to retain valuable traits thorough asexual reproduction. Therefore, a greater molecular understanding of apomixis is fundamental. In a previous work we identified a gene, namely APOSTART, that seemed to be involved in this asexual mode of reproduction, which is very common in Poa pratensis L., and here we present a detailed work aimed at clarifying its role in apomixis. In situ hybridization showed that PpAPOSTART is expressed in reproductive tissues from pre-meiosis to embryo development. Interestingly, it is expressed early in few nucellar cells of apomictic individuals possibly switching from a somatic to a reproductive cell as in aposporic apomixis. Moreover, out of 13 APOSTART members, we identified one, APOSTART_6, as specifically expressed in flower tissue. APOSTART_6 also exhibited delayed expression in apomictic genotypes when compared with sexual types. Most importantly, the SCAR (Sequence Characterized Amplified Region) derived from the APOSTART_6 sequence completely co-segregated with apomixis.
Collapse
Affiliation(s)
- Gianpiero Marconi
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Domenico Aiello
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Bryan Kindiger
- USDA-ARS, Grazinglands Research Laboratory, 7207 West Cheyenne St., El Reno, OK 73036, USA;
| | - Loriano Storchi
- Dipartimento di Farmacia, Università G. d’Annunzio, via dei Vestini 31, 66100 Chieti, Italy; (L.S.); (A.M.)
- Molecular Discovery Limited, Elstree WD6 3FG, UK
| | - Alessandro Marrone
- Dipartimento di Farmacia, Università G. d’Annunzio, via dei Vestini 31, 66100 Chieti, Italy; (L.S.); (A.M.)
| | - Lara Reale
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Niccolò Terzaroli
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Emidio Albertini
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
- Correspondence:
| |
Collapse
|
66
|
Hernández-Segura T, Pastor N. Identification of an α-MoRF in the Intrinsically Disordered Region of the Escargot Transcription Factor. ACS OMEGA 2020; 5:18331-18341. [PMID: 32743208 PMCID: PMC7392517 DOI: 10.1021/acsomega.0c02051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 07/02/2020] [Indexed: 06/11/2023]
Abstract
Molecular recognition features (MoRFs) are common in intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). MoRFs are in constant order-disorder structural transitions and adopt well-defined structures once they are bound to their targets. Here, we study Escargot (Esg), a transcription factor in Drosophila melanogaster that regulates multiple cellular functions, and consists of a disordered N-terminal domain and a group of zinc fingers at its C-terminal domain. We analyzed the N-terminal domain of Esg with disorder predictors and identified a region of 45 amino acids with high probability to form ordered structures, which we named S2. Through 54 μs of molecular dynamics (MD) simulations using CHARMM36 and implicit solvent (generalized Born/surface area (GBSA)), we characterized the conformational landscape of S2 and found an α-MoRF of ∼16 amino acids stabilized by key contacts within the helix. To test the importance of these contacts in the stability of the α-MoRF, we evaluated the effect of point mutations that would impair these interactions, running 24 μs of MD for each mutation. The mutations had mild effects on the MoRF, and in some cases, led to gain of residual structure through long-range contacts of the α-MoRF and the rest of the S2 region. As this could be an effect of the force field and solvent model we used, we benchmarked our simulation protocol by carrying out 32 μs of MD for the (AAQAA)3 peptide. The results of the benchmark indicate that the global amount of helix in shorter peptides like (AAQAA)3 is reasonably predicted. Careful analysis of the runs of S2 and its mutants suggests that the mutation to hydrophobic residues may have nucleated long-range hydrophobic and aromatic interactions that stabilize the MoRF. Finally, we have identified a set of residues that stabilize an α-MoRF in a region still without functional annotations in Esg.
Collapse
Affiliation(s)
- Teresa Hernández-Segura
- Laboratorio
de Dinámica de Proteínas, Centro de Investigación
en Dinámica Celular-IICBA, Universidad
Autónoma del Estado de Morelos, Av. Universidad 1001, Chamilpa, 62209 Cuernavaca, México
- Doctorado
en Ciencias CIDC-IICBA, Universidad Autónoma
del Estado de Morelos, Cuernavaca 62209, Morelos, México
| | - Nina Pastor
- Laboratorio
de Dinámica de Proteínas, Centro de Investigación
en Dinámica Celular-IICBA, Universidad
Autónoma del Estado de Morelos, Av. Universidad 1001, Chamilpa, 62209 Cuernavaca, México
| |
Collapse
|
67
|
Shi Q, Chen W, Huang S, Jin F, Dong Y, Wang Y, Xue Z. DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network. Bioinformatics 2020; 35:5128-5136. [PMID: 31197306 DOI: 10.1093/bioinformatics/btz464] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. RESULTS This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units' models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. AVAILABILITY AND IMPLEMENTATION The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Weiya Chen
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Siqi Huang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Fanglin Jin
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yinghao Dong
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yan Wang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhidong Xue
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
68
|
Oberti M, Vaisman II. cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks. Proteins 2020; 88:1472-1481. [PMID: 32535960 DOI: 10.1002/prot.25966] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 11/18/2019] [Accepted: 06/06/2020] [Indexed: 12/23/2022]
Abstract
Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method-which tries to overcome the challenge of accurate prediction posed by IDRs-based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.
Collapse
Affiliation(s)
- Mauricio Oberti
- School of Systems Biology, George Mason University, Manassas, Virginia, USA.,Novartis Institutes for BioMedical Research, Cambridge, Massachussets, USA
| | - Iosif I Vaisman
- School of Systems Biology, George Mason University, Manassas, Virginia, USA
| |
Collapse
|
69
|
Hanson J, Paliwal KK, Litfin T, Zhou Y. SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:645-656. [PMID: 32173600 PMCID: PMC7212484 DOI: 10.1016/j.gpb.2019.01.004] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/18/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023]
Abstract
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Kuldip K Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Thomas Litfin
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia; Institute for Glycomics, Griffith University, Gold Coast 4222, Australia.
| |
Collapse
|
70
|
LRRpredictor-A New LRR Motif Detection Method for Irregular Motifs of Plant NLR Proteins Using an Ensemble of Classifiers. Genes (Basel) 2020; 11:genes11030286. [PMID: 32182725 PMCID: PMC7140858 DOI: 10.3390/genes11030286] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 02/28/2020] [Accepted: 03/04/2020] [Indexed: 12/17/2022] Open
Abstract
Leucine-rich-repeats (LRRs) belong to an archaic procaryal protein architecture that is widely involved in protein-protein interactions. In eukaryotes, LRR domains developed into key recognition modules in many innate immune receptor classes. Due to the high sequence variability imposed by recognition specificity, precise repeat delineation is often difficult especially in plant NOD-like Receptors (NLRs) notorious for showing far larger irregularities. To address this problem, we introduce here LRRpredictor, a method based on an ensemble of estimators designed to better identify LRR motifs in general but particularly adapted for handling more irregular LRR environments, thus allowing to compensate for the scarcity of structural data on NLR proteins. The extrapolation capacity tested on a set of annotated LRR domains from six immune receptor classes shows the ability of LRRpredictor to recover all previously defined specific motif consensuses and to extend the LRR motif coverage over annotated LRR domains. This analysis confirms the increased variability of LRR motifs in plant and vertebrate NLRs when compared to extracellular receptors, consistent with previous studies. Hence, LRRpredictor is able to provide novel insights into the diversification of LRR domains and a robust support for structure-informed analyses of LRRs in immune receptor functioning.
Collapse
|
71
|
Mosior J, Bourland R, Soma S, Nathan C, Sacchettini J. Structural insights into phosphopantetheinyl hydrolase PptH from Mycobacterium tuberculosis. Protein Sci 2020; 29:744-757. [PMID: 31886928 PMCID: PMC7021004 DOI: 10.1002/pro.3813] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 11/07/2022]
Abstract
The amidinourea 8918 was recently reported to inhibit the type II phosphopantetheinyl transferase (PPTase) of Mycobacterium tuberculosis (Mtb), PptT, a potential drug-target that activates synthases and synthetases involved in cell wall biosynthesis and secondary metabolism. Surprisingly, high-level resistance to 8918 occurred in Mtb harboring mutations within the gene adjacent to pptT, rv2795c, highlighting the role of the encoded protein as a potentiator of the bactericidal action of the amidinourea. Those studies revealed that Rv2795c (PptH) is a phosphopantetheinyl (PpT) hydrolase, possessing activity antagonistic with respect to PptT. We have solved the crystal structure of Mtb's phosphopantetheinyl hydrolase, making it the first phosphopantetheinyl (carrier protein) hydrolase structurally characterized. The 2.5 Å structure revealed the hydrolases' four-layer (α/β/β/α) sandwich fold featuring a Mn-Fe binuclear center within the active site. A structural similarity search confirmed that PptH most closely resembles previously characterized metallophosphoesterases (MPEs), particularly within the vicinity of the active site, suggesting that it may utilize a similar catalytic mechanism. In addition, analysis of the structure has allowed for the rationalization of the previously reported PptH mutations associated with 8918-resistance. Notably, differences in the sequences and predicted structural characteristics of the PpT hydrolases PptH of Mtb and E. coli's acyl carrier protein hydrolase (AcpH) indicate that the two enzymes evolved convergently and therefore are representative of two distinct PpT hydrolase families.
Collapse
Affiliation(s)
- John Mosior
- Department of Biochemistry and BiophysicsTexas Agricultural and Mechanical UniversityCollege StationTexas
| | - Ronnie Bourland
- Department of Biochemistry and BiophysicsTexas Agricultural and Mechanical UniversityCollege StationTexas
| | - Shivatheja Soma
- Department of Biochemistry and BiophysicsTexas Agricultural and Mechanical UniversityCollege StationTexas
| | - Carl Nathan
- Department of Microbiology and ImmunologyWeill Cornell MedicineNew YorkNew York
| | - James Sacchettini
- Department of Biochemistry and BiophysicsTexas Agricultural and Mechanical UniversityCollege StationTexas
| |
Collapse
|
72
|
Liu Y, Wang X, Liu B. RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins. Brief Bioinform 2020; 22:2000-2011. [PMID: 32112084 PMCID: PMC7986600 DOI: 10.1093/bib/bbaa018] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
As an important type of proteins, intrinsically disordered proteins/regions (IDPs/IDRs) are related to many crucial biological functions. Accurate prediction of IDPs/IDRs is beneficial to the prediction of protein structures and functions. Most of the existing methods ignore the fully ordered proteins without IDRs during training and test processes. As a result, the corresponding predictors prefer to predict the fully ordered proteins as disordered proteins. Unfortunately, these methods were only evaluated on datasets consisting of disordered proteins without or with only a few fully ordered proteins, and therefore, this problem escapes the attention of the researchers. However, most of the newly sequenced proteins are fully ordered proteins in nature. These predictors fail to accurately predict the ordered and disordered proteins in real-world applications. In this regard, we propose a new method called RFPR-IDP trained with both fully ordered proteins and disordered proteins, which is constructed based on the combination of convolution neural network (CNN) and bidirectional long short-term memory (BiLSTM). The experimental results show that although the existing predictors perform well for predicting the disordered proteins, they tend to predict the fully ordered proteins as disordered proteins. In contrast, the RFPR-IDP predictor can correctly predict the fully ordered proteins and outperform the other 10 state-of-the-art methods when evaluated on a test dataset with both fully ordered proteins and disordered proteins. The web server and datasets of RFPR-IDP are freely available at http://bliulab.net/RFPR-IDP/server.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.,School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
73
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 132] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
74
|
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019; 22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open
Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining
| | - Weiya Chen
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization
| | - Siqi Huang
- Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining
| | - Yan Wang
- School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing
| |
Collapse
|
75
|
Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019; 20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open
Abstract
Intrinsically disordered proteins and regions are widely distributed in proteins, which are associated with many biological processes and diseases. Accurate prediction of intrinsically disordered proteins and regions is critical for both basic research (such as protein structure and function prediction) and practical applications (such as drug development). During the past decades, many computational approaches have been proposed, which have greatly facilitated the development of this important field. Therefore, a comprehensive and updated review is highly required. In this regard, we give a review on the computational methods for intrinsically disordered protein and region prediction, especially focusing on the recent development in this field. These computational approaches are divided into four categories based on their methodologies, including physicochemical-based method, machine-learning-based method, template-based method and meta method. Furthermore, their advantages and disadvantages are also discussed. The performance of 40 state-of-the-art predictors is directly compared on the target proteins in the task of disordered region prediction in the 10th Critical Assessment of protein Structure Prediction. A more comprehensive performance comparison of 45 different predictors is conducted based on seven widely used benchmark data sets. Finally, some open problems and perspectives are discussed.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| |
Collapse
|
76
|
Katuwawala A, Ghadermarzi S, Kurgan L. Computational prediction of functions of intrinsically disordered regions. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2019; 166:341-369. [PMID: 31521235 DOI: 10.1016/bs.pmbts.2019.04.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Intrinsically disorder regions (IDRs) are abundant in nature, particularly among Eukaryotes. While they facilitate a wide spectrum of cellular functions including signaling, molecular assembly and recognition, translation, transcription and regulation, only several hundred IDRs are annotated functionally. This annotation gap motivates the development of fast and accurate computational methods that predict IDR functions directly from protein sequences. We introduce and describe a comprehensive collection of 25 methods that provide accurate predictions of IDRs that interact with proteins and nucleic acids, that function as flexible linkers and that moonlight multiple functions. Virtually all of these predictors can be accessed online and many were developed in the last few years. They utilize a wide range of predictive architectures and take advantage of modern machine learning algorithms. Our empirical analysis shows that predictors that are available as webservers enjoy high rates of citations, attesting to their practical value and popularity. The most cited methods include DISOPRED3, ANCHOR, alpha-MoRFpred, MoRFpred, fMoRFpred and MoRFCHiBi. We present two case studies to demonstrate that predictions produced by these computational tools are relatively easy to interpret and that they deliver valuable functional clues. However, the current computational tools cover a relatively narrow range of disorder functions. Further development efforts that would cover a broader range of functions should be pursued. We demonstrate that a sufficient amount of functionally annotated IDRs that are associated with several other disorder functions is already available and can be used to design and validate novel predictors.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States.
| |
Collapse
|
77
|
Nielsen JT, Mulder FAA. Quality and bias of protein disorder predictors. Sci Rep 2019; 9:5137. [PMID: 30914747 PMCID: PMC6435736 DOI: 10.1038/s41598-019-41644-w] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/13/2019] [Indexed: 02/03/2023] Open
Abstract
Disorder in proteins is vital for biological function, yet it is challenging to characterize. Therefore, methods for predicting protein disorder from sequence are fundamental. Currently, predictors are trained and evaluated using data from X-ray structures or from various biochemical or spectroscopic data. However, the prediction accuracy of disordered predictors is not calibrated, nor is it established whether predictors are intrinsically biased towards one of the extremes of the order-disorder axis. We therefore generated and validated a comprehensive experimental benchmarking set of site-specific and continuous disorder, using deposited NMR chemical shift data. This novel experimental data collection is fully appropriate and represents the full spectrum of disorder. We subsequently analyzed the performance of 26 widely-used disorder prediction methods and found that these vary noticeably. At the same time, a distinct bias for over-predicting order was identified for some algorithms. Our analysis has important implications for the validity and the interpretation of protein disorder, as utilized, for example, in assessing the content of disorder in proteomes.
Collapse
Affiliation(s)
- Jakob T Nielsen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| | - Frans A A Mulder
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| |
Collapse
|
78
|
Oldfield CJ, Uversky VN, Dunker AK, Kurgan L. Introduction to intrinsically disordered proteins and regions. Proteins 2019. [DOI: 10.1016/b978-0-12-816348-1.00001-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
79
|
WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0155-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
80
|
Xue L, Tang B, Chen W, Luo J. DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence. Bioinformatics 2018; 35:2051-2057. [DOI: 10.1093/bioinformatics/bty931] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 10/22/2018] [Accepted: 11/07/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Li Xue
- School of Public Health, Southwest Medical University, Luzhou, Sichuan, PR, China
| | - Bin Tang
- Basic Medical College of Southwest Medical University, Luzhou, Sichuan, PR, China
| | - Wei Chen
- Integrative Genomics Core, City of Hope National Medical Center, Duarte, CA, USA
| | - Jiesi Luo
- Key Laboratory for Aging and Regenerative Medicine, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan, China
| |
Collapse
|
81
|
Hanson J, Paliwal K, Zhou Y. Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures. J Chem Inf Model 2018; 58:2369-2376. [DOI: 10.1021/acs.jcim.8b00636] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland 4222, Australia
| |
Collapse
|
82
|
Wang Z, Jumper JM, Wang S, Freed KF, Sosnick TR. A Membrane Burial Potential with H-Bonds and Applications to Curved Membranes and Fast Simulations. Biophys J 2018; 115:1872-1884. [PMID: 30413241 DOI: 10.1016/j.bpj.2018.10.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 09/21/2018] [Accepted: 10/10/2018] [Indexed: 10/28/2022] Open
Abstract
We use the statistics of a large and curated training set of transmembrane helical proteins to develop a knowledge-based potential that accounts for the dependence on both the depth of burial of the protein in the membrane and the degree of side-chain exposure. Additionally, the statistical potential includes depth-dependent energies for unsatisfied backbone hydrogen bond donors and acceptors, which are found to be relatively small, ∼2 RT. Our potential accurately places known proteins within the bilayer. The potential is applied to the mechanosensing MscL channel in membranes of varying thickness and curvature, as well as to the prediction of protein structure. The potential is incorporated into our new Upside molecular dynamics algorithm. Notably, we account for the exchange of protein-lipid interactions for protein-protein interactions as helices contact each other, thereby avoiding overestimating the energetics of helix association within the membrane. Simulations of most multimeric complexes find that isolated monomers and the oligomers retain the same orientation in the membrane, suggesting that the assembly of prepositioned monomers presents a viable mechanism of oligomerization.
Collapse
Affiliation(s)
- Zongan Wang
- Department of Chemistry, The University of Chicago, Chicago, Illinois; James Franck Institute, The University of Chicago, Chicago, Illinois
| | - John M Jumper
- Department of Chemistry, The University of Chicago, Chicago, Illinois; James Franck Institute, The University of Chicago, Chicago, Illinois; Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois
| | - Sheng Wang
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia; Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Karl F Freed
- Department of Chemistry, The University of Chicago, Chicago, Illinois; James Franck Institute, The University of Chicago, Chicago, Illinois.
| | - Tobin R Sosnick
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois; Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois.
| |
Collapse
|
83
|
Zhao B, Xue B. Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs. Int J Mol Sci 2018; 19:E3052. [PMID: 30301243 PMCID: PMC6213717 DOI: 10.3390/ijms19103052] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 09/24/2018] [Accepted: 10/04/2018] [Indexed: 02/06/2023] Open
Abstract
Using computational techniques to identify intrinsically disordered residues is practical and effective in biological studies. Therefore, designing novel high-accuracy strategies is always preferable when existing strategies have a lot of room for improvement. Among many possibilities, a meta-strategy that integrates the results of multiple individual predictors has been broadly used to improve the overall performance of predictors. Nonetheless, a simple and direct integration of individual predictors may not effectively improve the performance. In this project, dual-threshold two-step significance voting and neural networks were used to integrate the predictive results of four individual predictors, including: DisEMBL, IUPred, VSL2, and ESpritz. The new meta-strategy has improved the prediction performance of intrinsically disordered residues significantly, compared to all four individual predictors and another four recently-designed predictors. The improvement was validated using five-fold cross-validation and in independent test datasets.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL 33620, USA.
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL 33620, USA.
| |
Collapse
|
84
|
Kunjithapatham R, Ganapathy-Kanniappan S. GAPDH with NAD +-binding site mutation competitively inhibits the wild-type and affects glucose metabolism in cancer. Biochim Biophys Acta Gen Subj 2018; 1862:2555-2563. [PMID: 30077773 DOI: 10.1016/j.bbagen.2018.08.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 12/12/2022]
Abstract
BACKGROUND Rapid utilization of glucose is a metabolic signature of majority of cancers, hence enzymes of the glycolytic pathway remain attractive therapeutic targets. Recent reports have shown that targeting the glycolytic enzyme, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), an abundant, ubiquitous multifunctional protein frequently upregulated in cancer, affects cancer progression. Here, we report that a catalytically-deficient mutant-GAPDH competitively inhibits the wild-type, and disrupts glucose metabolism in cancer cells. METHODS Using site-directed mutagenesis, the human GAPDH clone was mutated at one of the NAD+-binding sites, (i.e.) arginine (R13) and isoleucine (I14) to glutamine (Q13) and phenylalanine (F14), respectively. The inhibitory role of the mutant-GAPDH, and its effect on energy metabolism and cancer phenotype was determined using in vitro and in vivo models of cancer. RESULTS The enzymatically-dysfunctional mutant-GAPDH competitively inhibited the wild-type GAPDH in a cell-free system. In cancer cells, ectopic expression of the mutant-GAPDH, but not the wild-type, inhibited the glycolytic capacity of cellular-GAPDH, and led to the induction of metabolic stress accompanied by a sharp decline in glucose-uptake. Furthermore, expression of mutant-GAPDH affected cancer growth in vitro and in vivo. Mechanistically, structural analysis by bioinformatics revealed that the mutations at the NAD+-binding site altered the solvent-accessibility that perhaps affected the functionality of mutant-GAPDH. CONCLUSION Mutant-GAPDH affects the enzymatic function of cellular-GAPDH and disrupts energy metabolism. GENERAL SIGNIFICANCE Our findings demonstrate that a minimal mutation at the NAD+-binding site is sufficient to generate a competitive but dysfunctional GAPDH, and its ectopic expression inhibits the wild-type to disrupt glycolysis.
Collapse
Affiliation(s)
- Rani Kunjithapatham
- The Division of Interventional Radiology, Russell H. Morgan Department of Radiology & Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Shanmugasundaram Ganapathy-Kanniappan
- The Division of Interventional Radiology, Russell H. Morgan Department of Radiology & Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
85
|
Yang Z, Tsui SKW. Functional Annotation of Proteins Encoded by the Minimal Bacterial Genome Based on Secondary Structure Element Alignment. J Proteome Res 2018; 17:2511-2520. [PMID: 29757649 DOI: 10.1021/acs.jproteome.8b00262] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In synthetic biology, one of the key focuses is building a minimal artificial cell which can provide basic chassis for functional study. Recently, the J. Craig Venter Institute published the latest version of the minimal bacterial genome JCVI-syn3.0, which only encoded 438 essential proteins. However, among them functions of 149 proteins remain unknown because of the lack of effective annotation method. Here, we report a secondary structure element alignment method called SSEalign based on an effective training data set extracting from various bacterial genomes. The experimentally validated homologous genes in different species were selected as training positives, while unrelated genes in different species were selected as training negatives. Moreover, SSEalign used a set of well-defined basic alignment elements with the backtracking line search algorithm to derive the best parameters for accurate prediction. Experimental results showed that SSEalign achieved 88.2% test accuracy, which is better than the existing prediction methods. SSEalign was subsequently applied to identify the functions of those unannotated proteins in the latest published minimal bacteria genome JCVI-syn3.0. Results indicated that at least 136 proteins out of 149 unannotated proteins in the JCVI-syn3.0 genome could be annotated by SSEalign. Our method is effective for the identification of protein homology in JCVI-syn3.0 and can be used to annotate those hypothetical proteins in other bacterial genomes.
Collapse
Affiliation(s)
- Zhiyuan Yang
- College of Life Information Science & Instrument Engineering , Hangzhou Dianzi University , Hangzhou 310018 , China.,School of Biomedical Sciences , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.,Hong Kong Bioinformatics Centre , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong
| | - Stephen Kwok-Wing Tsui
- School of Biomedical Sciences , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.,Hong Kong Bioinformatics Centre , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.,Centre for Microbial Genomics and Proteomics , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong
| |
Collapse
|
86
|
Shao M, Ma J, Wang S. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics 2018; 33:i267-i273. [PMID: 28881999 PMCID: PMC5870651 DOI: 10.1093/bioinformatics/btx267] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. Results We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. Availability and implementation DeepBound is freely available at https://github.com/realbigws/DeepBound.
Collapse
Affiliation(s)
- Mingfu Shao
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- To whom correspondence should be addressed. or
| | - Jianzhu Ma
- School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Sheng Wang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- To whom correspondence should be addressed. or
| |
Collapse
|
87
|
Tsafou K, Tiwari PB, Forman-Kay JD, Metallo SJ, Toretsky JA. Targeting Intrinsically Disordered Transcription Factors: Changing the Paradigm. J Mol Biol 2018; 430:2321-2341. [PMID: 29655986 DOI: 10.1016/j.jmb.2018.04.008] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 03/21/2018] [Accepted: 04/05/2018] [Indexed: 12/21/2022]
Abstract
Increased understanding of intrinsically disordered proteins (IDPs) and protein regions has revolutionized our view of the relationship between protein structure and function. Data now support that IDPs can be functional in the absence of a single, fixed, three-dimensional structure. Due to their dynamic morphology, IDPs have the ability to display a range of kinetics and affinity depending on what the system requires, as well as the potential for large-scale association. Although several studies have shed light on the functional properties of IDPs, the class of intrinsically disordered transcription factors (TFs) is still poorly characterized biophysically due to their combination of ordered and disordered sequences. In addition, TF modulation by small molecules has long been considered a difficult or even impossible task, limiting functional probe development. However, with evolving technology, it is becoming possible to characterize TF structure-function relationships in unprecedented detail and explore avenues not available or not considered in the past. Here we provide an introduction to the biophysical properties of intrinsically disordered TFs and we discuss recent computational and experimental efforts toward understanding the role of intrinsically disordered TFs in biology and disease. We describe a series of successful TF targeting strategies that have overcome the perception of the "undruggability" of TFs, providing new leads on drug development methodologies. Lastly, we discuss future challenges and opportunities to enhance our understanding of the structure-function relationship of intrinsically disordered TFs.
Collapse
Affiliation(s)
- K Tsafou
- Department of Oncology and Pediatrics, Georgetown University, 3970 Reservoir Road Northwest, Washington, DC 20057, USA
| | - P B Tiwari
- Department of Oncology and Pediatrics, Georgetown University, 3970 Reservoir Road Northwest, Washington, DC 20057, USA
| | - J D Forman-Kay
- Molecular Medicine, The Hospital for Sick Children, Toronto M5G 0A4, Canada; Department of Biochemistry, University of Toronto, Toronto M5G 1X8, Canada
| | - S J Metallo
- Department of Chemistry, Georgetown University, Washington, DC 20057, USA
| | - J A Toretsky
- Department of Oncology and Pediatrics, Georgetown University, 3970 Reservoir Road Northwest, Washington, DC 20057, USA.
| |
Collapse
|
88
|
Necci M, Piovesan D, Dosztányi Z, Tompa P, Tosatto SCE. A comprehensive assessment of long intrinsic protein disorder from the DisProt database. Bioinformatics 2017; 34:445-452. [DOI: 10.1093/bioinformatics/btx590] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Accepted: 09/15/2017] [Indexed: 12/30/2022] Open
Affiliation(s)
- Marco Necci
- Department of Biomedical Sciences, University of Padua, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padova, Italy
| | - Zsuzsanna Dosztányi
- Agricoltural Sciences, University of Udine, Udine, Italy
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Peter Tompa
- Fondazione Edmund Mach, S. Michele all'Adige, Italy
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), and Center for Structural Biology (CSB), Flanders Institute for Biotechnology (VIB), Brussels, Belgium
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padova, Italy
- CNR Institute of Neuroscience, Padova, Italy
| |
Collapse
|
89
|
Du Z, Uversky VN. Functional roles of intrinsic disorder in CRISPR-associated protein Cas9. MOLECULAR BIOSYSTEMS 2017; 13:1770-1780. [PMID: 28692085 DOI: 10.1039/c7mb00279c] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein intrinsic disorder is an important characteristic commonly detected in multifunctional or RNA- and DNA-binding proteins. Due to their high conformational flexibility and solvent accessibility, intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) execute diverse functions including interaction with multiple partners, and are frequently subjected to various post-translational modifications. Recent studies on the components comprising the CRISPR (clustered regularly interspaced short palindromic repeats) system have elucidated the crystal structure of Cas9 proteins and the mechanism by which the Cas9-sgRNA complex recognizes and cleaves its target DNA. Yet the extent and functional implications of intrinsic disorder in the Cas9 protein have never been fully assessed. Here, we present a comprehensive computational analysis based on both sequence and structural data in an attempt to investigate the roles of IDPRs in the functioning of Cas9 proteins of different origin. We conclude that among the functional roles of IDPRs in Cas9 proteins are recognition of the target DNA and mediation of nucleic acid and protein binding.
Collapse
Affiliation(s)
- Zhihua Du
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, Florida, USA
| | | |
Collapse
|