1
|
Zhang J, Zhou F, Liang X, Kurgan L. Accurate Prediction of Protein-Binding Residues in Protein Sequences Using SCRIBER. Methods Mol Biol 2025; 2867:247-260. [PMID: 39576586 DOI: 10.1007/978-1-0716-4196-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Deciphering molecular-level mechanisms that govern protein-protein interactions (PPIs) relies in part on the accurate prediction of protein-binding partners and protein-binding residues. These predictions can be used to support a wide spectrum of applications that include development of PPI networks and protein docking programs, drug design studies, and investigations of molecular details that underlie certain diseases. Computational methods that predict protein-binding residues offer convenient, inexpensive, and relatively accurate data that can aid these efforts. We introduce and describe a user-friendly webserver for the SCRIBER method that conveniently provides state-of-the-art predictions of protein-binding residues and that minimizes cross-predictions, i.e., incorrect prediction of residues that bind other/non-protein ligands as protein binding. SCRIBER relies on a two-layer architecture that is specifically designed to reduce the cross-predictions. We motivate and explain this predictive architecture. We describe how to use the webserver, interact with its web interface, and collect, read, and understand results generated by SCRIBER. The SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/ .
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China.
| | - Feng Zhou
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Xingchen Liang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Zhao B, Basu S, Kurgan L. DescribePROT Database of Residue-Level Protein Structure and Function Annotations. Methods Mol Biol 2025; 2867:169-184. [PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
DescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data. Moreover, we briefly discuss plans for a future expansion of this database. DescribePROT is available at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ .
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
3
|
Wang K, Hu G, Wu Z, Kurgan L. Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn. Methods Mol Biol 2025; 2867:201-218. [PMID: 39576583 DOI: 10.1007/978-1-0716-4196-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
4
|
Zhang F, Kurgan L. Evaluation of predictions of disordered binding regions in the CAID2 experiment. Comput Struct Biotechnol J 2024; 27:78-88. [PMID: 39811792 PMCID: PMC11732247 DOI: 10.1016/j.csbj.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 12/12/2024] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open
Abstract
A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2's evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.
Collapse
Affiliation(s)
- Fuhao Zhang
- College of Information Engineering, Northwest A & F University, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
5
|
Xu S, Onoda A. Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning. J Chem Inf Model 2024; 64:2901-2911. [PMID: 37883249 DOI: 10.1021/acs.jcim.3c01202] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Intrinsically disordered proteins (IDPs) play a vital role in various biological processes and have attracted increasing attention in the past few decades. Predicting IDPs from the primary structures of proteins offers a rapid and facile means of protein analysis without necessitating crystal structures. In particular, machine learning methods have demonstrated their potential in this field. Recently, protein language models (PLMs) are emerging as a promising approach to extracting essential information from protein sequences and have been employed in protein modeling to utilize their advantages of precision and efficiency. In this article, we developed a novel IDP prediction method named IDP-ELM to predict the intrinsically disordered regions (IDRs) as well as their functions including disordered flexible linkers and disordered protein binding. This method utilizes high-dimensional representations extracted from several state-of-the-art PLMs and predicts IDRs by ensemble learning based on bidirectional recurrent neural networks. The performance of the method was evaluated on two independent test data sets from CAID (critical assessment of protein intrinsic disorder prediction) and CAID2, indicating notable improvements in terms of area under the receiver operating characteristic (AUC), Matthew's correlation coefficient (MCC), and F1 score. Moreover, IDP-ELM requires solely protein sequences as inputs and does not entail a time-consuming process of protein profile generation, which is a prerequisite for most existing state-of-the-art methods, enabling an accurate, fast, and convenient tool for proteome-level analysis. The corresponding reproducible source code and model weights are available at https://github.com/xu-shi-jie/idp-elm.
Collapse
Affiliation(s)
- Shijie Xu
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Akira Onoda
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810, Japan
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
6
|
Shahrajabian MH, Sun W. Characterization of Intrinsically Disordered Proteins in Healthy and Diseased States by Nuclear Magnetic Resonance. Rev Recent Clin Trials 2024; 19:176-188. [PMID: 38409704 DOI: 10.2174/0115748871271420240213064251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/10/2023] [Accepted: 12/13/2023] [Indexed: 02/28/2024]
Abstract
INTRODUCTION Intrinsically Disordered Proteins (IDPs) are active in different cellular procedures like ordered assembly of chromatin and ribosomes, interaction with membrane, protein, and ligand binding, molecular recognition, binding, and transportation via nuclear pores, microfilaments and microtubules process and disassembly, protein functions, RNA chaperone, and nucleic acid binding, modulation of the central dogma, cell cycle, and other cellular activities, post-translational qualification and substitute splicing, and flexible entropic linker and management of signaling pathways. METHODS The intrinsic disorder is a precise structural characteristic that permits IDPs/IDPRs to be involved in both one-to-many and many-to-one signaling. IDPs/IDPRs also exert some dynamical and structural ordering, being much less constrained in their activities than folded proteins. Nuclear magnetic resonance (NMR) spectroscopy is a major technique for the characterization of IDPs, and it can be used for dynamic and structural studies of IDPs. RESULTS AND CONCLUSION This review was carried out to discuss intrinsically disordered proteins and their different goals, as well as the importance and effectiveness of NMR in characterizing intrinsically disordered proteins in healthy and diseased states.
Collapse
Affiliation(s)
- Mohamad Hesam Shahrajabian
- National Key Laboratory of Agricultural Microbiology, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wenli Sun
- National Key Laboratory of Agricultural Microbiology, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
7
|
Yang J, Cheng WX, Wu G, Sheng S, Zhang P. Prediction of folding patterns for intrinsic disordered protein. Sci Rep 2023; 13:20343. [PMID: 37990040 PMCID: PMC10663623 DOI: 10.1038/s41598-023-45969-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 10/26/2023] [Indexed: 11/23/2023] Open
Abstract
The conformation flexibility of natural protein causes both complexity and difficulty to understand the relationship between structure and function. The prediction of intrinsically disordered protein primarily is focusing on to disclose the regions with structural flexibility involving relevant biological functions and various diseases. The order of amino acids in protein sequence determines possible conformations, folding flexibility and biological function. Although many methods provided the information of intrinsically disordered protein (IDP), but the results are mainly limited to determine the locations of regions without knowledge of possible folding conformations. Here, the developed protein folding fingerprint adopted the protein folding variation matrix (PFVM) to reveal all possible folding patterns for the intrinsically disordered protein along its sequence. The PFVM integrally exhibited the intrinsically disordered protein with disordering regions, degree of disorder as well as folding pattern. The advantage of PFVM will not only provide rich information for IDP, but also may promote the study of protein folding problem.
Collapse
Affiliation(s)
- Jiaan Yang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China.
- Micro Biotech, Ltd., Shanghai, 200123, China.
| | - Wen-Xiang Cheng
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Gang Wu
- School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Sitong Sheng
- HYK High-throughput Biotechnology Institute, Shenzhen, 518057, Guangdong, China
| | - Peng Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| |
Collapse
|
8
|
Basu S, Gsponer J, Kurgan L. DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction. Nucleic Acids Res 2023:7151337. [PMID: 37140058 DOI: 10.1093/nar/gkad330] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 04/12/2023] [Accepted: 04/18/2023] [Indexed: 05/05/2023] Open
Abstract
Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
9
|
Zhang F, Li M, Zhang J, Kurgan L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res 2023; 51:e25. [PMID: 36629262 PMCID: PMC10018345 DOI: 10.1093/nar/gkac1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/22/2022] [Accepted: 12/15/2022] [Indexed: 01/12/2023] Open
Abstract
The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
10
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
11
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
12
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
13
|
Bondos SE, Dunker AK, Uversky VN. Intrinsically disordered proteins play diverse roles in cell signaling. Cell Commun Signal 2022; 20:20. [PMID: 35177069 PMCID: PMC8851865 DOI: 10.1186/s12964-022-00821-7] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/11/2021] [Indexed: 11/29/2022] Open
Abstract
Signaling pathways allow cells to detect and respond to a wide variety of chemical (e.g. Ca2+ or chemokine proteins) and physical stimuli (e.g., sheer stress, light). Together, these pathways form an extensive communication network that regulates basic cell activities and coordinates the function of multiple cells or tissues. The process of cell signaling imposes many demands on the proteins that comprise these pathways, including the abilities to form active and inactive states, and to engage in multiple protein interactions. Furthermore, successful signaling often requires amplifying the signal, regulating or tuning the response to the signal, combining information sourced from multiple pathways, all while ensuring fidelity of the process. This sensitivity, adaptability, and tunability are possible, in part, due to the inclusion of intrinsically disordered regions in many proteins involved in cell signaling. The goal of this collection is to highlight the many roles of intrinsic disorder in cell signaling. Following an overview of resources that can be used to study intrinsically disordered proteins, this review highlights the critical role of intrinsically disordered proteins for signaling in widely diverse organisms (animals, plants, bacteria, fungi), in every category of cell signaling pathway (autocrine, juxtacrine, intracrine, paracrine, and endocrine) and at each stage (ligand, receptor, transducer, effector, terminator) in the cell signaling process. Thus, a cell signaling pathway cannot be fully described without understanding how intrinsically disordered protein regions contribute to its function. The ubiquitous presence of intrinsic disorder in different stages of diverse cell signaling pathways suggest that more mechanisms by which disorder modulates intra- and inter-cell signals remain to be discovered.
Collapse
Affiliation(s)
- Sarah E. Bondos
- Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, College Station, TX 77843 USA
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612 USA
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Moscow Region, Russia 142290
| |
Collapse
|
14
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
15
|
Zhang F, Zhao B, Shi W, Li M, Kurgan L. DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform 2021; 23:6461158. [PMID: 34905768 DOI: 10.1093/bib/bbab521] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 10/30/2021] [Accepted: 11/14/2021] [Indexed: 12/14/2022] Open
Abstract
Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Wenbo Shi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
16
|
Giri R, Bhardwaj T, Shegane M, Gehi BR, Kumar P, Gadhave K, Oldfield CJ, Uversky VN. Understanding COVID-19 via comparative analysis of dark proteomes of SARS-CoV-2, human SARS and bat SARS-like coronaviruses. Cell Mol Life Sci 2021; 78:1655-1688. [PMID: 32712910 DOI: 10.1101/2020.03.13.990598] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/03/2020] [Accepted: 07/17/2020] [Indexed: 05/18/2023]
Abstract
The recently emerged coronavirus designated as SARS-CoV-2 (also known as 2019 novel coronavirus (2019-nCoV) or Wuhan coronavirus) is a causative agent of coronavirus disease 2019 (COVID-19), which is rapidly spreading throughout the world now. More than 1.21 million cases of SARS-CoV-2 infection and more than 67,000 COVID-19-associated mortalities have been reported worldwide till the writing of this article, and these numbers are increasing every passing hour. The World Health Organization (WHO) has declared the SARS-CoV-2 spread as a global public health emergency and admitted COVID-19 as a pandemic now. Multiple sequence alignment data correlated with the already published reports on SARS-CoV-2 evolution indicated that this virus is closely related to the bat severe acute respiratory syndrome-like coronavirus (bat SARS-like CoV) and the well-studied human SARS coronavirus (SARS-CoV). The disordered regions in viral proteins are associated with the viral infectivity and pathogenicity. Therefore, in this study, we have exploited a set of complementary computational approaches to examine the dark proteomes of SARS-CoV-2, bat SARS-like, and human SARS CoVs by analysing the prevalence of intrinsic disorder in their proteins. According to our findings, SARS-CoV-2 proteome contains very significant levels of structural order. In fact, except for nucleocapsid, Nsp8, and ORF6, the vast majority of SARS-CoV-2 proteins are mostly ordered proteins containing less intrinsically disordered protein regions (IDPRs). However, IDPRs found in SARS-CoV-2 proteins are functionally important. For example, cleavage sites in its replicase 1ab polyprotein are found to be highly disordered, and almost all SARS-CoV-2 proteins contains molecular recognition features (MoRFs), which are intrinsic disorder-based protein-protein interaction sites that are commonly utilized by proteins for interaction with specific partners. The results of our extensive investigation of the dark side of SARS-CoV-2 proteome will have important implications in understanding the structural and non-structural biology of SARS or SARS-like coronaviruses.
Collapse
Affiliation(s)
- Rajanish Giri
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India.
| | - Taniya Bhardwaj
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Meenakshi Shegane
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Bhuvaneshwari R Gehi
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Prateek Kumar
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Kundlik Gadhave
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | | | - Vladimir N Uversky
- Department of Molecular Medicine, Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Moscow region, Pushchino, 142290, Russia
| |
Collapse
|
17
|
Zhao B, Katuwawala A, Oldfield CJ, Dunker AK, Faraggi E, Gsponer J, Kloczkowski A, Malhis N, Mirdita M, Obradovic Z, Söding J, Steinegger M, Zhou Y, Kurgan L. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res 2021; 49:D298-D308. [PMID: 33119734 PMCID: PMC7778963 DOI: 10.1093/nar/gkaa931] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 12/30/2022] Open
Abstract
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | | | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eshel Faraggi
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Zoran Obradovic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
18
|
Giri R, Bhardwaj T, Shegane M, Gehi BR, Kumar P, Gadhave K, Oldfield CJ, Uversky VN. Understanding COVID-19 via comparative analysis of dark proteomes of SARS-CoV-2, human SARS and bat SARS-like coronaviruses. Cell Mol Life Sci 2020; 78:1655-1688. [PMID: 32712910 PMCID: PMC7382329 DOI: 10.1007/s00018-020-03603-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/03/2020] [Accepted: 07/17/2020] [Indexed: 01/08/2023]
Abstract
The recently emerged coronavirus designated as SARS-CoV-2 (also known as 2019 novel coronavirus (2019-nCoV) or Wuhan coronavirus) is a causative agent of coronavirus disease 2019 (COVID-19), which is rapidly spreading throughout the world now. More than 1.21 million cases of SARS-CoV-2 infection and more than 67,000 COVID-19-associated mortalities have been reported worldwide till the writing of this article, and these numbers are increasing every passing hour. The World Health Organization (WHO) has declared the SARS-CoV-2 spread as a global public health emergency and admitted COVID-19 as a pandemic now. Multiple sequence alignment data correlated with the already published reports on SARS-CoV-2 evolution indicated that this virus is closely related to the bat severe acute respiratory syndrome-like coronavirus (bat SARS-like CoV) and the well-studied human SARS coronavirus (SARS-CoV). The disordered regions in viral proteins are associated with the viral infectivity and pathogenicity. Therefore, in this study, we have exploited a set of complementary computational approaches to examine the dark proteomes of SARS-CoV-2, bat SARS-like, and human SARS CoVs by analysing the prevalence of intrinsic disorder in their proteins. According to our findings, SARS-CoV-2 proteome contains very significant levels of structural order. In fact, except for nucleocapsid, Nsp8, and ORF6, the vast majority of SARS-CoV-2 proteins are mostly ordered proteins containing less intrinsically disordered protein regions (IDPRs). However, IDPRs found in SARS-CoV-2 proteins are functionally important. For example, cleavage sites in its replicase 1ab polyprotein are found to be highly disordered, and almost all SARS-CoV-2 proteins contains molecular recognition features (MoRFs), which are intrinsic disorder-based protein–protein interaction sites that are commonly utilized by proteins for interaction with specific partners. The results of our extensive investigation of the dark side of SARS-CoV-2 proteome will have important implications in understanding the structural and non-structural biology of SARS or SARS-like coronaviruses.
Collapse
Affiliation(s)
- Rajanish Giri
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India.
| | - Taniya Bhardwaj
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Meenakshi Shegane
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Bhuvaneshwari R Gehi
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Prateek Kumar
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | - Kundlik Gadhave
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Mandi, Himachal Pradesh, 175005, India
| | | | - Vladimir N Uversky
- Department of Molecular Medicine, Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Moscow region, Pushchino, 142290, Russia
| |
Collapse
|
19
|
Palomino‐Hernandez O, Margreiter MA, Rossetti G. Challenges in RNA Regulation in Huntington's Disease: Insights from Computational Studies. Isr J Chem 2020. [DOI: 10.1002/ijch.202000021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Oscar Palomino‐Hernandez
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM-9)/Instute for advanced simulations (IAS-5)Forschungszentrum Juelich 52425 Jülich Germany
- Faculty 1RWTH Aachen 52425 Aachen Germany
- Computation-based Science and Technology Research CenterThe Cyprus Institute Nicosia 2121 Cyprus
- Institute of Life ScienceThe Hebrew University of Jerusalem Jerusalem 91904 Israel
| | - Michael A. Margreiter
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM-9)/Instute for advanced simulations (IAS-5)Forschungszentrum Juelich 52425 Jülich Germany
- Faculty 1RWTH Aachen 52425 Aachen Germany
| | - Giulia Rossetti
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM-9)/Instute for advanced simulations (IAS-5)Forschungszentrum Juelich 52425 Jülich Germany
- Jülich Supercomputing Centre (JSC)Forschungszentrum Jülich 52425 Jülich Germany
- Department of Hematology, Oncology, Hemostaseology and Stem Cell Transplantation University Hospital AachenRWTH Aachen University Pauwelsstraße 30 52074 Aachen Germany
| |
Collapse
|