1
|
Zhao B, Basu S, Kurgan L. DescribePROT Database of Residue-Level Protein Structure and Function Annotations. Methods Mol Biol 2025; 2867:169-184. [PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
DescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data. Moreover, we briefly discuss plans for a future expansion of this database. DescribePROT is available at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ .
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Wang K, Hu G, Wu Z, Kurgan L. Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn. Methods Mol Biol 2025; 2867:201-218. [PMID: 39576583 DOI: 10.1007/978-1-0716-4196-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
3
|
Basu S, Kurgan L. Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses. Comput Struct Biotechnol J 2024; 23:1968-1977. [PMID: 38765610 PMCID: PMC11098722 DOI: 10.1016/j.csbj.2024.04.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
Intrinsic disorder predictors were evaluated in several studies including the two large CAID experiments. However, these studies are biased towards eukaryotic proteins and focus primarily on the residue-level predictions. We provide first-of-its-kind assessment that comprehensively covers the taxonomy and evaluates predictions at the residue and disordered region levels. We curate a benchmark dataset that uniformly covers eukaryotic, archaeal, bacterial, and viral proteins. We find that predictive performance differs substantially across taxonomy, where viruses are predicted most accurately, followed by protists and higher eukaryotes, while bacterial and archaeal proteins suffer lower levels of accuracy. These trends are consistent across predictors. We also find that current tools, except for flDPnn, struggle with reproducing native distributions of the numbers and sizes of the disordered regions. Moreover, analysis of two variants of disorder predictions derived from the AlphaFold2 predicted structures reveals that they produce accurate residue-level propensities for archaea, bacteria and protists. However, they underperform for higher eukaryotes and generally struggle to accurately identify disordered regions. Our results motivate development of new predictors that target bacteria and archaea and which produce accurate results at both residue and region levels. We also stress the need to include the region-level assessments in future assessments.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
4
|
Peck Y, Pickering D, Mobli M, Liddell MJ, Wilson DT, Ruscher R, Ryan S, Buitrago G, McHugh C, Love NC, Pinlac T, Haertlein M, Kron MA, Loukas A, Daly NL. Solution structure of the N-terminal extension domain of a Schistosoma japonicum asparaginyl-tRNA synthetase. J Biomol Struct Dyn 2024; 42:7934-7944. [PMID: 37572327 DOI: 10.1080/07391102.2023.2241918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/24/2023] [Indexed: 08/14/2023]
Abstract
Several secreted proteins from helminths (parasitic worms) have been shown to have immunomodulatory activities. Asparaginyl-tRNA synthetases are abundantly secreted in the filarial nematode Brugia malayi (BmAsnRS) and the parasitic flatworm Schistosoma japonicum (SjAsnRS), indicating a possible immune function. The suggestion is supported by BmAsnRS alleviating disease symptoms in a T-cell transfer mouse model of colitis. This immunomodulatory function is potentially related to an N-terminal extension domain present in eukaryotic AsnRS proteins but few structure/function studies have been done on this domain. Here we have determined the three-dimensional solution structure of the N-terminal extension domain of SjAsnRS. A protein containing the 114 N-terminal amino acids of SjAsnRS was recombinantly expressed with isotopic labelling to allow structure determination using 3D NMR spectroscopy, and analysis of dynamics using NMR relaxation experiments. Structural comparisons of the N-terminal extension domain of SjAsnRS with filarial and human homologues highlight a high degree of variability in the β-hairpin region of these eukaryotic N-AsnRS proteins, but similarities in the disorder of the C-terminal regions. Limitations in PrDOS-based intrinsically disordered region (IDR) model predictions were also evident in this comparison. Empirical structural data such as that presented in our study for N-SjAsnRS will enhance the prediction of sequence-homology based structure modelling and prediction of IDRs in the future.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Yoshimi Peck
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Darren Pickering
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Mehdi Mobli
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, Australia
| | - Michael J Liddell
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
| | - David T Wilson
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Roland Ruscher
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Stephanie Ryan
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Geraldine Buitrago
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Connor McHugh
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | | | - Theresa Pinlac
- Department of Biochemistry, University of the Philippines, Manila, Philippines
| | | | - Michael A Kron
- Department of Medicine, Division of Infectious Diseases, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Alex Loukas
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Norelle L Daly
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| |
Collapse
|
5
|
Song J, Kurgan L. Availability of web servers significantly boosts citations rates of bioinformatics methods for protein function and disorder prediction. BIOINFORMATICS ADVANCES 2023; 3:vbad184. [PMID: 38146538 PMCID: PMC10749743 DOI: 10.1093/bioadv/vbad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/08/2023] [Accepted: 12/15/2023] [Indexed: 12/27/2023]
Abstract
Motivation Development of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations. Results We show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Clayton, VIC 3800, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
6
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
7
|
Abstract
There are over 100 computational predictors of intrinsic disorder. These methods predict amino acid-level propensities for disorder directly from protein sequences. The propensities can be used to annotate putative disordered residues and regions. This unit provides a practical and holistic introduction to the sequence-based intrinsic disorder prediction. We define intrinsic disorder, explain the format of computational prediction of disorder, and identify and describe several accurate predictors. We also introduce recently released databases of intrinsic disorder predictions and use an illustrative example to provide insights into how predictions should be interpreted and combined. Lastly, we summarize key experimental methods that can be used to validate computational predictions. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
8
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
9
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
10
|
Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules 2022; 12:biom12070888. [PMID: 35883444 PMCID: PMC9313023 DOI: 10.3390/biom12070888] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/10/2022] [Accepted: 06/10/2022] [Indexed: 11/17/2022] Open
Abstract
Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes.
Collapse
|
11
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
12
|
McFadden WM, Yanowitz JL. idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R. PLoS One 2022; 17:e0266929. [PMID: 35436286 PMCID: PMC9015136 DOI: 10.1371/journal.pone.0266929] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 03/29/2022] [Indexed: 12/23/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are proteins or protein-domains that do not have a single native structure, rather, they are a class of flexible peptides that can rapidly adopt multiple conformations. IDPs are quite abundant, and their dynamic characteristics provide unique advantages for various biological processes. The field of “unstructured biology” has emerged, in part, because of numerous computational studies that had identified the unique characteristics of IDPs and IDRs. The package ‘idpr’, short for Intrinsically Disordered Proteins in R, implements several R functions that match the established characteristics of IDPs to protein sequences of interest. This includes calculations of residue composition, charge-hydropathy relationships, and predictions of intrinsic disorder. Additionally, idpr integrates several amino acid substitution matrices and calculators to supplement IDP-based workflows. Overall, idpr aims to integrate tools for the computational analysis of IDPs within R, facilitating the analysis of these important, yet under-characterized, proteins. The idpr package can be downloaded from Bioconductor (https://bioconductor.org/packages/idpr/).
Collapse
Affiliation(s)
| | - Judith L. Yanowitz
- Magee-Womens Research Institute, Pittsburgh, PA, United States of America
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- * E-mail:
| |
Collapse
|
13
|
Zhao B, Kurgan L. Deep learning in prediction of intrinsic disorder in proteins. Comput Struct Biotechnol J 2022; 20:1286-1294. [PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder prediction is an active area that has developed over 100 predictors. We identify and investigate a recent trend towards the development of deep neural network (DNN)-based methods. The first DNN-based method was released in 2013 and since 2019 deep learners account for majority of the new disorder predictors. We find that the 13 currently available DNN-based predictors are diverse in their topologies, sizes of their networks and the inputs that they utilize. We empirically show that the deep learners are statistically more accurate than other types of disorder predictors using the blind test dataset from the recent community assessment of intrinsic disorder predictions (CAID). We also identify several well-rounded DNN-based predictors that are accurate, fast and/or conveniently available. The popularity, favorable predictive performance and architectural flexibility suggest that deep networks are likely to fuel the development of future disordered predictors. Novel hybrid designs of deep networks could be used to adequately accommodate for diversity of types and flavors of intrinsic disorder. We also discuss scarcity of the DNN-based methods for the prediction of disordered binding regions and the need to develop more accurate methods for this prediction.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
14
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
15
|
Oldfield CJ, Peng Z, Kurgan L. Disordered RNA-Binding Region Prediction with DisoRDPbind. Methods Mol Biol 2021; 2106:225-239. [PMID: 31889261 DOI: 10.1007/978-1-0716-0231-7_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
RNA chaperone activity is one of the many functions of intrinsically disordered regions (IDRs). IDRs function without the prerequisite of a stable structure. Instead, their functions arise from structural ensembles. A common theme in IDR function is molecular recognition; IDRs mediate interactions with other proteins, RNA, and DNA. Many computational methods are available to predict IDRs from protein sequence, but relatively few are available for predicting IDR functions. Available methods primarily focus on protein-protein interactions. DisoRDPbind was developed to predict several protein functions including interactions with RNA. This method is available as a user-friendly web interface, located at http://biomine.cs.vcu.edu/servers/DisoRDPbind/ . The development and architecture of DisoRDPbind is briefly presented, and its accuracy relative to other RNA-binding residue predictors is discussed. We explain usage of the web interface in detail and provide an example of prediction results and interpretation. While DisoRDPbind does not identify RNA chaperones directly, we provide a case study of an RNA chaperone, HCV core protein, as an example of the method's utility in the study of RNA chaperones.
Collapse
Affiliation(s)
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
16
|
Hassan SS, Aljabali AAA, Panda PK, Ghosh S, Attrish D, Choudhury PP, Seyran M, Pizzol D, Adadi P, Abd El-Aziz TM, Soares A, Kandimalla R, Lundstrom K, Lal A, Azad GK, Uversky VN, Sherchan SP, Baetas-da-Cruz W, Uhal BD, Rezaei N, Chauhan G, Barh D, Redwan EM, Dayhoff GW, Bazan NG, Serrano-Aroca Á, El-Demerdash A, Mishra YK, Palu G, Takayama K, Brufsky AM, Tambuwala MM. A unique view of SARS-CoV-2 through the lens of ORF8 protein. Comput Biol Med 2021; 133:104380. [PMID: 33872970 PMCID: PMC8049180 DOI: 10.1016/j.compbiomed.2021.104380] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/01/2021] [Accepted: 04/02/2021] [Indexed: 01/07/2023]
Abstract
Immune evasion is one of the unique characteristics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) attributed to its ORF8 protein. This protein modulates the adaptive host immunity through down-regulation of MHC-1 (Major Histocompatibility Complex) molecules and innate immune responses by surpassing the host's interferon-mediated antiviral response. To understand the host's immune perspective in reference to the ORF8 protein, a comprehensive study of the ORF8 protein and mutations possessed by it have been performed. Chemical and structural properties of ORF8 proteins from different hosts, such as human, bat, and pangolin, suggest that the ORF8 of SARS-CoV-2 is much closer to ORF8 of Bat RaTG13-CoV than to that of Pangolin-CoV. Eighty-seven mutations across unique variants of ORF8 in SARS-CoV-2 can be grouped into four classes based on their predicted effects (Hussain et al., 2021) [1]. Based on the geo-locations and timescale of sample collection, a possible flow of mutations was built. Furthermore, conclusive flows of amalgamation of mutations were found upon sequence similarity analyses and consideration of the amino acid conservation phylogenies. Therefore, this study seeks to highlight the uniqueness of the rapidly evolving SARS-CoV-2 through the ORF8.
Collapse
Affiliation(s)
- Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, 721140, India
| | - Alaa A A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University-Faculty of Pharmacy, Irbid, 566, Jordan
| | - Pritam Kumar Panda
- Condensed Matter Theory Group, Materials Theory Division, Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20, Uppsala, Sweden
| | - Shinjini Ghosh
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, 700009, West Bengal, India
| | - Diksha Attrish
- Dr. B. R. Ambedkar Centre for Biomedical Research (ACBR), University of Delhi (North Campus), Delhi, 110007, India
| | - Pabitra Pal Choudhury
- Applied Statistics Unit, Indian Statistical Institute, Kolkata, 700108, West Bengal, India
| | - Murat Seyran
- Doctoral Studies in Natural and Technical Sciences (SPL 44), University of Vienna, Austria
| | - Damiano Pizzol
- Italian Agency for Development Cooperation - Khartoum, Sudan Street 33, Al Amarat, Sudan
| | - Parise Adadi
- Department of Food Science, University of Otago, Dunedin, 9054, New Zealand
| | - Tarek Mohamed Abd El-Aziz
- Zoology Department, Faculty of Science, Minia University, El-Minia, 61519, Egypt; Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA
| | - Antonio Soares
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA
| | - Ramesh Kandimalla
- CSIR-Indian Institute of Chemical Technology Uppal Road, Tarnaka, Hyderabad, 500007, Telangana State, India
| | | | - Amos Lal
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
| | - Samendra P Sherchan
- Department of Environmental Health Sciences, Tulane University, New Orleans, LA, 70112, USA
| | - Wagner Baetas-da-Cruz
- Translational Laboratory in Molecular Physiology, Centre for Experimental Surgery, College of Medicine, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | - Bruce D Uhal
- Department of Physiology, Michigan State University, East Lansing, MI, 48824, USA
| | - Nima Rezaei
- Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran and Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Stockholm, Sweden
| | - Gaurav Chauhan
- School of Engineering and Sciences, Tecnologico de Monterrey, Av. Eugenio Garza Sada 2501, Sur, 64849, Monterrey, NL, Mexico Tecnológico De Monterrey, Campus Monterrey, Monterrey, Nuevo León, Mexico
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), PatnaPatna, India
| | - Elrashdy M Redwan
- King Abdulazizi University, Faculty of Science, Department of Biological Science, Saudi Arabia
| | - Guy W Dayhoff
- Department of Chemistry, College of Art and Sciences, University of South Florida, Tampa, FL, 33620, USA
| | - Nicolas G Bazan
- Neuroscience Center of Excellence, School of Medicine, Louisiana State University Health New Orleans, New Orleans, LA, 70112, USA
| | - Ángel Serrano-Aroca
- Biomaterials and Bioengineering Lab, Translational Research Centre San Alberto Magno, Catholic University of Valencia San Vicente Mártir, C/Guillem de Castro 94, 46001, Valencia, Spain
| | - Amr El-Demerdash
- Natural Products and Medicinal Chemistry Department, Institute de Chimie des Substances Naturelles, Gif-sur-Yvette, France
| | - Yogendra K Mishra
- University of Southern Denmark, Mads Clausen Institute, NanoSYD, Alsion 2, 6400 Sønderborg, Denmark
| | - Giorgio Palu
- Department of Molecular Medicine, University of Padova, Italy
| | - Kazuo Takayama
- Center for IPS Cell Research and Application, Kyoto University, Kyoto, 606-8397, Japan
| | - Adam M Brufsky
- University of Pittsburgh School of Medicine, Department of Medicine, Division of Hematology/Oncology, UPMC Hillman Cancer Center, Pittsburgh, PA, USA
| | - Murtaza M Tambuwala
- School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine BT52 1SA, Northern Ireland, UK.
| |
Collapse
|
17
|
Katuwawala A, Ghadermarzi S, Hu G, Wu Z, Kurgan L. QUARTERplus: Accurate disorder predictions integrated with interpretable residue-level quality assessment scores. Comput Struct Biotechnol J 2021; 19:2597-2606. [PMID: 34025946 PMCID: PMC8122155 DOI: 10.1016/j.csbj.2021.04.066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/24/2021] [Accepted: 04/24/2021] [Indexed: 12/13/2022] Open
Abstract
A recent advance in the disorder prediction field is the development of the quality assessment (QA) scores. QA scores complement the propensities produced by the disorder predictors by identifying regions where these predictions are more likely to be correct. We develop, empirically test and release a new QA tool, QUARTERplus, that addresses several key drawbacks of the current QA method, QUARTER. QUARTERplus is the first solution that utilizes QA scores and the associated input disorder predictions to produce very accurate disorder predictions with the help of a modern deep learning meta-model. The deep neural network utilizes the QA scores to identify and fix the regions where the original/input disorder predictions are poor. More importantly, the accurate QUATERplus's predictions are accompanied by easy to interpret residue-level QA scores that reliably quantify their residue-level predictive quality. We provide these interpretable QA scores for QUARTERplus and 10 other popular disorder predictors. Empirical tests on a large and independent (low similarity) test dataset show that QUARTERplus predictions secure AUC = 0.93 and are statistically more accurate than the results of twelve state-of-the-art disorder predictors. We also demonstrate that the new QA scores produced by QUARTERplus are highly correlated with the actual predictive quality and that they can be effectively used to identify regions of correct disorder predictions. This feature empowers the users to easily identify which parts of the predictions generated by the modern disorder predictors are more trustworthy. QUARTERplus is available as a convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTERplus/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
18
|
Identification of Intrinsically Disordered Protein Regions Based on Deep Neural Network-VGG16. ALGORITHMS 2021. [DOI: 10.3390/a14040107] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The accurate of i identificationntrinsically disordered proteins or protein regions is of great importance, as they are involved in critical biological process and related to various human diseases. In this paper, we develop a deep neural network that is based on the well-known VGG16. Our deep neural network is then trained through using 1450 proteins from the dataset DIS1616 and the trained neural network is tested on the remaining 166 proteins. Our trained neural network is also tested on the blind test set R80 and MXD494 to further demonstrate the performance of our model. The MCC value of our trained deep neural network is 0.5132 on the test set DIS166, 0.5270 on the blind test set R80 and 0.4577 on the blind test set MXD494. All of these MCC values of our trained deep neural network exceed the corresponding values of existing prediction methods.
Collapse
|
19
|
Zhao B, Katuwawala A, Uversky VN, Kurgan L. IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell. Cell Mol Life Sci 2021; 78:2371-2385. [PMID: 32997198 PMCID: PMC11071772 DOI: 10.1007/s00018-020-03654-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/09/2020] [Accepted: 09/22/2020] [Indexed: 12/11/2022]
Abstract
Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL, 33612, USA.
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia.
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| |
Collapse
|
20
|
Zhao B, Katuwawala A, Oldfield CJ, Dunker AK, Faraggi E, Gsponer J, Kloczkowski A, Malhis N, Mirdita M, Obradovic Z, Söding J, Steinegger M, Zhou Y, Kurgan L. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res 2021; 49:D298-D308. [PMID: 33119734 PMCID: PMC7778963 DOI: 10.1093/nar/gkaa931] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 12/30/2022] Open
Abstract
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | | | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eshel Faraggi
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Zoran Obradovic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
21
|
Zhang F, Shi W, Zhang J, Zeng M, Li M, Kurgan L. PROBselect: accurate prediction of protein-binding residues from proteins sequences via dynamic predictor selection. Bioinformatics 2020; 36:i735-i744. [DOI: 10.1093/bioinformatics/btaa806] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 12/13/2022] Open
Abstract
Abstract
Motivation
Knowledge of protein-binding residues (PBRs) improves our understanding of protein−protein interactions, contributes to the prediction of protein functions and facilitates protein−protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One unexplored option to improve the predictive quality is to design consensus predictors that combine results produced by multiple methods.
Results
We empirically investigate predictive performance of a representative set of nine predictors of PBRs. We report substantial differences in predictive quality when these methods are used to predict individual proteins, which contrast with the dataset-level benchmarks that are currently used to assess and compare these methods. Our analysis provides new insights for the cross-prediction concern, dissects complementarity between predictors and demonstrates that predictive performance of the top methods depends on unique characteristics of the input protein sequence. Using these insights, we developed PROBselect, first-of-its-kind consensus predictor of PBRs. Our design is based on the dynamic predictor selection at the protein level, where the selection relies on regression-based models that accurately estimate predictive performance of selected predictors directly from the sequence. Empirical assessment using a low-similarity test dataset shows that PROBselect provides significantly improved predictive quality when compared with the current predictors and conventional consensuses that combine residue-level predictions. Moreover, PROBselect informs the users about the expected predictive quality for the prediction generated from a given input protein.
Availability and implementation
PROBselect is available at http://bioinformatics.csu.edu.cn/PROBselect/home/index.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Wenbo Shi
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Min Zeng
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
22
|
Katuwawala A, Kurgan L. Comparative Assessment of Intrinsic Disorder Predictions with a Focus on Protein and Nucleic Acid-Binding Proteins. Biomolecules 2020; 10:E1636. [PMID: 33291838 PMCID: PMC7762010 DOI: 10.3390/biom10121636] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 01/18/2023] Open
Abstract
With over 60 disorder predictors, users need help navigating the predictor selection task. We review 28 surveys of disorder predictors, showing that only 11 include assessment of predictive performance. We identify and address a few drawbacks of these past surveys. To this end, we release a novel benchmark dataset with reduced similarity to the training sets of the considered predictors. We use this dataset to perform a first-of-its-kind comparative analysis that targets two large functional families of disordered proteins that interact with proteins and with nucleic acids. We show that limiting sequence similarity between the benchmark and the training datasets has a substantial impact on predictive performance. We also demonstrate that predictive quality is sensitive to the use of the well-annotated order and inclusion of the fully structured proteins in the benchmark datasets, both of which should be considered in future assessments. We identify three predictors that provide favorable results using the new benchmark set. While we find that VSL2B offers the most accurate and robust results overall, ESpritz-DisProt and SPOT-Disorder perform particularly well for disordered proteins. Moreover, we find that predictions for the disordered protein-binding proteins suffer low predictive quality compared to generic disordered proteins and the disordered nucleic acids-binding proteins. This can be explained by the high disorder content of the disordered protein-binding proteins, which makes it difficult for the current methods to accurately identify ordered regions in these proteins. This finding motivates the development of a new generation of methods that would target these difficult-to-predict disordered proteins. We also discuss resources that support users in collecting and identifying high-quality disorder predictions.
Collapse
Affiliation(s)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| |
Collapse
|
23
|
Yan J, Cheng J, Kurgan L, Uversky VN. Structural and functional analysis of "non-smelly" proteins. Cell Mol Life Sci 2020; 77:2423-2440. [PMID: 31486849 PMCID: PMC11105052 DOI: 10.1007/s00018-019-03292-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 08/21/2019] [Accepted: 08/28/2019] [Indexed: 01/09/2023]
Abstract
Cysteine and aromatic residues are major structure-promoting residues. We assessed the abundance, structural coverage, and functional characteristics of the "non-smelly" proteins, i.e., proteins that do not contain cysteine residues (C-depleted) or cysteine and aromatic residues (CFYWH-depleted), across 817 proteomes from all domains of life. The analysis revealed that although these proteomes contained significant levels of the C-depleted proteins, with prokaryotes being significantly more enriched in such proteins than eukaryotes, the CFYWH-depleted proteins were relatively rare, accounting for about 0.05% of proteomes. Furthermore, CFYWH-depleted proteins were virtually never found in PDB. Depletion in cysteine and in aromatic residues was associated with the substantially increased intrinsic disorder levels across all domains of life. Archaeal and eukaryotic organisms with higher levels of the C-depleted proteins were shown to have higher levels of the intrinsic disorder and lower levels of structural coverage. We also showed that the "non-smelly" proteins typically did not independently fold into monomeric structures, and instead, they fold by interacting with nucleic acids as constituents of the ribosome and nucleosome complexes. They were shown to be involved in translation, transcription, nucleosome assembly, transmembrane transport, and protein folding functions, all of which are known to be associated with the intrinsic disorder. Our data suggested that, in general, structure of monomeric proteins is crucially dependent on the presence of cysteine and aromatic residues.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL, 33612, USA.
- Protein Research Group, Institute for Biological Instrumentation of the Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
| |
Collapse
|
24
|
Martins IC, Santos NC. Intrinsically disordered protein domains in flavivirus infection. Arch Biochem Biophys 2020; 683:108298. [PMID: 32045581 DOI: 10.1016/j.abb.2020.108298] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 02/03/2020] [Accepted: 02/05/2020] [Indexed: 12/30/2022]
Abstract
Intrinsically disordered protein regions are at the core of biological processes and involved in key protein-ligand interactions. The Flavivirus proteins, of viruses of great biomedical importance such as Zika and dengue viruses, exemplify this. Several proteins of these viruses have disordered regions that are of the utmost importance for biological activity. Disordered proteins can adopt several conformations, each able to interact with and/or bind to different ligands. In fact, such interactions can help stabilize a particular fold. Moreover, by being promiscuous in the number of target molecules they can bind to, these protein regions increase the number of functions that their small proteome (10 proteins) can achieve. A folding energy waterfall better describes the protein folding landscape of these proteins. A disordered protein can be thought as rolling down the folding energy cascade, in order "to fall, fold and function". This is the case of many viral protein regions, as seen in the flaviviruses proteome. Given their small size, flaviviruses are a good model system for understanding the role of intrinsically disordered protein regions in viral function. Finally, studying these viruses disordered protein regions will certainly contribute to the development of therapeutic approaches against such promising (yet challenging) targets.
Collapse
Affiliation(s)
- Ivo C Martins
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal.
| | - Nuno C Santos
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal.
| |
Collapse
|
25
|
Bhopatkar AA, Uversky VN, Rangachari V. Granulins modulate liquid-liquid phase separation and aggregation of the prion-like C-terminal domain of the neurodegeneration-associated protein TDP-43. J Biol Chem 2020; 295:2506-2519. [PMID: 31911437 DOI: 10.1074/jbc.ra119.011501] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 01/02/2020] [Indexed: 12/13/2022] Open
Abstract
TAR DNA-binding protein 43 (TDP-43) has emerged as a key player in many neurodegenerative pathologies, including frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS). Hallmarks of both FTLD and ALS are the toxic cytoplasmic inclusions of the prion-like C-terminal fragments of TDP-43 CTD (TDP-43 C-terminal domain), formed upon proteolytic cleavage of full-length TDP-43 in the nucleus and subsequent transport to the cytoplasm. Both full-length TDP-43 and its CTD are also known to form stress granules by coacervating with RNA in the cytoplasm during stress and may be involved in these pathologies. Furthermore, mutations in the PGRN gene, leading to haploinsufficiency and diminished function of progranulin (PGRN) protein, are strongly linked to FTLD and ALS. Recent reports have indicated that proteolytic processing of PGRN to smaller protein modules called granulins (GRNs) contributes to FTLD and ALS progression, with specific GRNs exacerbating TDP-43-induced cytotoxicity. Here we investigated the interactions between the proteolytic products of both TDP-43 and PGRN. Based on structural disorder and charge distributions, we hypothesized that GRN-3 and GRN-5 could interact with the TDP-43 CTD. We show that, under both reducing and oxidizing conditions, GRN-3 and GRN-5 interact with and differentially modulate TDP-43 CTD aggregation and/or liquid-liquid phase separation in vitro GRN-3 promoted insoluble aggregates of the TDP-43 CTD while GRN-5 mediated liquid-liquid phase separation. These results constitute the first observation of an interaction between GRNs and TDP-43, suggesting a mechanism by which attenuated PGRN function could lead to familial FTLD or ALS.
Collapse
Affiliation(s)
- Anukool A Bhopatkar
- Department of Chemistry and Biochemistry, School of Mathematics and Natural Sciences, University of Southern Mississippi, Hattiesburg, Mississippi 39406
| | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33620; Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
| | - Vijayaraghavan Rangachari
- Department of Chemistry and Biochemistry, School of Mathematics and Natural Sciences, University of Southern Mississippi, Hattiesburg, Mississippi 39406.
| |
Collapse
|
26
|
Oldfield CJ, Fan X, Wang C, Dunker AK, Kurgan L. Computational Prediction of Intrinsic Disorder in Protein Sequences with the disCoP Meta-predictor. Methods Mol Biol 2020; 2141:21-35. [PMID: 32696351 DOI: 10.1007/978-1-0716-0524-0_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Intrinsically disordered proteins are either entirely disordered or contain disordered regions in their native state. These proteins and regions function without the prerequisite of a stable structure and were found to be abundant across all kingdoms of life. Experimental annotation of disorder lags behind the rapidly growing number of sequenced proteins, motivating the development of computational methods that predict disorder in protein sequences. DisCoP is a user-friendly webserver that provides accurate sequence-based prediction of protein disorder. It relies on meta-architecture in which the outputs generated by multiple disorder predictors are combined together to improve predictive performance. The architecture of disCoP is presented, and its accuracy relative to several other disorder predictors is briefly discussed. We describe usage of the web interface and explain how to access and read results generated by this computational tool. We also provide an example of prediction results and interpretation. The disCoP's webserver is publicly available at http://biomine.cs.vcu.edu/servers/disCoP/ .
Collapse
Affiliation(s)
| | - Xiao Fan
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Chen Wang
- Department of Medicine, Columbia University, New York, NY, USA
| | - A Keith Dunker
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
27
|
Abstract
Intrinsically disordered regions (IDRs) are estimated to be highly abundant in nature. While only several thousand proteins are annotated with experimentally derived IDRs, computational methods can be used to predict IDRs for the millions of currently uncharacterized protein chains. Several dozen disorder predictors were developed over the last few decades. While some of these methods provide accurate predictions, unavoidably they also make some mistakes. Consequently, one of the challenges facing users of these methods is how to decide which predictions can be trusted and which are likely incorrect. This practical problem can be solved using quality assessment (QA) scores that predict correctness of the underlying (disorder) predictions at a residue level. We motivate and describe a first-of-its-kind toolbox of QA methods, QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions), which provides the scores for a diverse set of ten disorder predictors. QUARTER is available to the end users as a free and convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTER/ . We briefly describe the predictive architecture of QUARTER and provide detailed instructions on how to use the webserver. We also explain how to interpret results produced by QUARTER with the help of a case study.
Collapse
|
28
|
Katuwawala A, Oldfield CJ, Kurgan L. DISOselect: Disorder predictor selection at the protein level. Protein Sci 2020; 29:184-200. [PMID: 31642118 PMCID: PMC6933862 DOI: 10.1002/pro.3756] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/27/2022]
Abstract
The intense interest in the intrinsically disordered proteins in the life science community, together with the remarkable advancements in predictive technologies, have given rise to the development of a large number of computational predictors of intrinsic disorder from protein sequence. While the growing number of predictors is a positive trend, we have observed a considerable difference in predictive quality among predictors for individual proteins. Furthermore, variable predictor performance is often inconsistent between predictors for different proteins, and the predictor that shows the best predictive performance depends on the unique properties of each protein sequence. We propose a computational approach, DISOselect, to estimate the predictive performance of 12 selected predictors for individual proteins based on their unique sequence-derived properties. This estimation informs the users about the expected predictive quality for a selected disorder predictor and can be used to recommend methods that are likely to provide the best quality predictions. Our solution does not depend on the results of any disorder predictor; the estimations are made based solely on the protein sequence. Our solution significantly improves predictive performance, as judged with a test set of 1,000 proteins, when compared to other alternatives. We have empirically shown that by using the recommended methods the overall predictive performance for a given set of proteins can be improved by a statistically significant margin. DISOselect is freely available for non-commercial users through the webserver at http://biomine.cs.vcu.edu/servers/DISOselect/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| | | | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| |
Collapse
|
29
|
Vincent M, Schnell S. Disorder Atlas: Web-based software for the proteome-based interpretation of intrinsic disorder predictions. Comput Biol Chem 2019; 83:107090. [PMID: 31326853 PMCID: PMC7368971 DOI: 10.1016/j.compbiolchem.2019.107090] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 12/05/2018] [Accepted: 07/10/2019] [Indexed: 11/18/2022]
Abstract
Intrinsically disordered proteins lack a stable three-dimensional structure under physiological conditions. While this property has gained considerable interest within the past two decades, disorder poses substantial challenges to experimental characterization efforts. In effect, numerous computational tools have been developed to predict disorder from primary sequences, however, interpreting the output of these algorithms remains a challenge. To begin to bridge this gap, we present Disorder Atlas, web-based software that facilitates the interpretation of intrinsic disorder predictions using proteome-based descriptive statistics. This service is also equipped to facilitate large-scale systematic exploratory searches for proteins encompassing disorder features of interest, and further allows users to browse the prevalence of multiple disorder features at the proteome level. As a result, Disorder Atlas provides a user-friendly tool that places algorithm-generated disorder predictions in the context of the proteome, thereby providing an instrument to compare the results of a query protein against predictions made for an entire population. Disorder Atlas currently supports ten eukaryotic proteomes and is freely available for non-commercial users at http://www.disorderatlas.org.
Collapse
Affiliation(s)
- Michael Vincent
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Santiago Schnell
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, USA; Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, MI, USA.
| |
Collapse
|
30
|
Ghadermarzi S, Li X, Li M, Kurgan L. Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins. Front Genet 2019; 10:1075. [PMID: 31803227 PMCID: PMC6872670 DOI: 10.3389/fgene.2019.01075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug–protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.
Collapse
Affiliation(s)
- Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
31
|
Katuwawala A, Oldfield CJ, Kurgan L. Accuracy of protein-level disorder predictions. Brief Bioinform 2019; 21:1509-1522. [DOI: 10.1093/bib/bbz100] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/22/2019] [Accepted: 07/15/2019] [Indexed: 01/15/2023] Open
Abstract
Abstract
Experimental annotations of intrinsic disorder are available for 0.1% of 147 000 000 of currently sequenced proteins. Over 60 sequence-based disorder predictors were developed to help bridge this gap. Current benchmarks of these methods assess predictive performance on datasets of proteins; however, predictions are often interpreted for individual proteins. We demonstrate that the protein-level predictive performance varies substantially from the dataset-level benchmarks. Thus, we perform first-of-its-kind protein-level assessment for 13 popular disorder predictors using 6200 disorder-annotated proteins. We show that the protein-level distributions are substantially skewed toward high predictive quality while having long tails of poor predictions. Consequently, between 57% and 75% proteins secure higher predictive performance than the currently used dataset-level assessment suggests, but as many as 30% of proteins that are located in the long tails suffer low predictive performance. These proteins typically have relatively high amounts of disorder, in contrast to the mostly structured proteins that are predicted accurately by all 13 methods. Interestingly, each predictor provides the most accurate results for some number of proteins, while the best-performing at the dataset-level method is in fact the best for only about 30% of proteins. Moreover, the majority of proteins are predicted more accurately than the dataset-level performance of the most accurate tool by at least four disorder predictors. While these results suggests that disorder predictors outperform their current benchmark performance for the majority of proteins and that they complement each other, novel tools that accurately identify the hard-to-predict proteins and that make accurate predictions for these proteins are needed.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, USA
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Christopher J Oldfield
- Department of Computer Science, Virginia Commonwealth University, USA
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
- Department of Computer Science, Virginia Commonwealth University, USA
| |
Collapse
|
32
|
Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019; 20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open
Abstract
Intrinsically disordered proteins and regions are widely distributed in proteins, which are associated with many biological processes and diseases. Accurate prediction of intrinsically disordered proteins and regions is critical for both basic research (such as protein structure and function prediction) and practical applications (such as drug development). During the past decades, many computational approaches have been proposed, which have greatly facilitated the development of this important field. Therefore, a comprehensive and updated review is highly required. In this regard, we give a review on the computational methods for intrinsically disordered protein and region prediction, especially focusing on the recent development in this field. These computational approaches are divided into four categories based on their methodologies, including physicochemical-based method, machine-learning-based method, template-based method and meta method. Furthermore, their advantages and disadvantages are also discussed. The performance of 40 state-of-the-art predictors is directly compared on the target proteins in the task of disordered region prediction in the 10th Critical Assessment of protein Structure Prediction. A more comprehensive performance comparison of 45 different predictors is conducted based on seven widely used benchmark data sets. Finally, some open problems and perspectives are discussed.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| |
Collapse
|
33
|
Permyakov SE, Vologzhannikova AS, Nemashkalova EL, Kazakov AS, Denesyuk AI, Denessiouk K, Baksheeva VE, Zamyatnin AA, Zernii EY, Uversky VN, Permyakov EA. Experimental Insight into the Structural and Functional Roles of the 'Black' and 'Gray' Clusters in Recoverin, a Calcium Binding Protein with Four EF-Hand Motifs. Molecules 2019; 24:E2494. [PMID: 31288444 PMCID: PMC6650976 DOI: 10.3390/molecules24132494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 07/03/2019] [Accepted: 07/07/2019] [Indexed: 11/17/2022] Open
Abstract
Recently, we have found that calcium binding proteins of the EF-hand superfamily (i.e., a large family of proteins containing helix-loop-helix calcium binding motif or EF-hand) contain two types of conserved clusters called cluster I ('black' cluster) and cluster II ('grey' cluster), which provide a supporting scaffold for the Ca2+ binding loops and contribute to the hydrophobic core of the EF-hand domains. Cluster I is more conservative and mostly incorporates aromatic amino acids, whereas cluster II includes a mix of aromatic, hydrophobic, and polar amino acids of different sizes. Recoverin is EF-hand Ca2+-binding protein containing two 'black' clusters comprised of F35, F83, Y86 (N-terminal domain) and F106, E169, F172 (C-terminal domain) as well as two 'gray' clusters comprised of F70, Q46, F49 (N-terminal domain) and W156, K119, V122 (C-terminal domain). To understand a role of these residues in structure and function of human recoverin, we sequentially substituted them for alanine and studied the resulting mutants by a set of biophysical methods. Under metal-free conditions, the 'black' clusters mutants (except for F35A and E169A) were characterized by an increase in the α-helical content, whereas the 'gray' cluster mutants (except for K119A) exhibited the opposite behavior. By contrast, in Ca2+-loaded mutants the α-helical content was always elevated. In the absence of calcium, the substitutions only slightly affected multimerization of recoverin regardless of their localization (except for K119A). Meanwhile, in the presence of calcium mutations in N-terminal domain of the protein significantly suppressed this process, indicating that surface properties of Ca2+-bound recoverin are highly affected by N-terminal cluster residues. The substitutions in C-terminal clusters generally reduced thermal stability of recoverin with F172A ('black' cluster) as well as W156A and K119A ('gray' cluster) being the most efficacious in this respect. In contrast, the mutations in the N-terminal clusters caused less pronounced differently directed changes in thermal stability of the protein. The substitutions of F172, W156, and K119 in C-terminal domain of recoverin together with substitution of Q46 in its N-terminal domain provoked significant but diverse changes in free energy associated with Ca2+ binding to the protein: the mutant K119A demonstrated significantly improved calcium binding, whereas F172A and W156A showed decrease in the calcium affinity and Q46A exhibited no ion coordination in one of the Ca2+-binding sites. The most of the N-terminal clusters mutations suppressed membrane binding of recoverin and its inhibitory activity towards rhodopsin kinase (GRK1). Surprisingly, the mutant W156A aberrantly activated rhodopsin phosphorylation regardless of the presence of calcium. Taken together, these data confirm the scaffolding function of several cluster-forming residues and point to their critical role in supporting physiological activity of recoverin.
Collapse
Affiliation(s)
- Sergey E Permyakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Alisa S Vologzhannikova
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Ekaterina L Nemashkalova
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Alexei S Kazakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia
- Department of Biomedical Engineering, Pushchino State Institute of Natural Sciences, 142290 Pushchino, Russia
| | - Alexander I Denesyuk
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, 20520 Turku, Finland
| | - Konstantin Denessiouk
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, 20520 Turku, Finland
- Pharmaceutical Sciences Laboratory, Pharmacy, Faculty of Science and Engineering, Åbo Akademi University, 20520 Turku, Finland
| | - Viktoriia E Baksheeva
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia
| | - Andrey A Zamyatnin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia
- Institute of Molecular Medicine, Sechenov First Moscow State Medical University, 119991 Moscow, Russia
| | - Evgeni Yu Zernii
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia
- Institute of Molecular Medicine, Sechenov First Moscow State Medical University, 119991 Moscow, Russia
| | - Vladimir N Uversky
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia.
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| | - Eugene A Permyakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences, 142290 Pushchino, Russia.
- Department of Biomedical Engineering, Pushchino State Institute of Natural Sciences, 142290 Pushchino, Russia.
| |
Collapse
|
34
|
Permyakova ME, Permyakov SE, Kazakov AS, Denesyuk AI, Denessiouk K, Uversky VN, Permyakov EA. Analyzing the structural and functional roles of residues from the 'black' and 'gray' clusters of human S100P protein. Cell Calcium 2019; 80:46-55. [PMID: 30953998 DOI: 10.1016/j.ceca.2019.03.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 03/26/2019] [Accepted: 03/27/2019] [Indexed: 10/27/2022]
Abstract
Two highly conserved structural motifs observed in members of the EF-hand family of calcium binding proteins. The motifs provide a supporting scaffold for the Ca2+ binding loops and contribute to the hydrophobic core of the EF-hand domain. Each structural motif represents a cluster of three amino acids called cluster I ('black' cluster) and cluster II ('grey' cluster). Cluster I is more conserved and mostly incorporates aromatic amino acids. In contrast, cluster II is noticeably less conserved and includes a mix of aromatic, hydrophobic, and polar amino acids of different sizes. In the human calcium binding S100 P protein, these 'black' and 'gray' clusters include residues F15, F71, and F74 and L33, L58, and K30, respectively. To evaluate the effects of these clusters on structure and functionality of human S100 P, we have performed Ala scanning. The resulting mutants were studied by a multiparametric approach that included circular dichroism, scanning calorimetry, dynamic light scattering, chemical crosslinking, and fluorescent probes. Spectrofluorimetric Ca2+-titration of wild type S100 P showed that S100 P dimer has 1-2 strong calcium binding sites (K1 = 4 × 106 M-1) and two cooperative low affinity (K2 = 4 × 104 M-1) binding sites. Similarly, the S100 P mutants possess two types of calcium binding sites. This analysis revealed that the alanine substitutions in the clusters I and II caused comparable changes in the S100 P functional properties. However, analysis of heat- or GuHCl-induced unfolding of these proteins showed that the alanine substitutions in the cluster I caused notably more pronounced decrease in the protein stability compared to the changes caused by alanine substitutions in the cluster II. Opposite to literature data, the F15 A substitution did not cause the S100 P dimer dissociation, indicating that F15 is not crucial for dimer stability. Overall, similar to parvalbumins, the S100 P cluster I is more important for protein conformational stability than the cluster II.
Collapse
Affiliation(s)
- Maria E Permyakova
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region, 142290, Russia.
| | - Sergei E Permyakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region, 142290, Russia; Department of Biomedical Engineering, Pushchino State Institute of Natural Sciences, Pushchino, Moscow region, 142290, Russia.
| | - Alexei S Kazakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region, 142290, Russia; Department of Biomedical Engineering, Pushchino State Institute of Natural Sciences, Pushchino, Moscow region, 142290, Russia
| | - Alexander I Denesyuk
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region, 142290, Russia; Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Turku, 20520, Finland
| | - Konstantin Denessiouk
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Turku, 20520, Finland; Pharmaceutical Sciences Laboratory, Pharmacy, Faculty of Science and Engineering, Åbo Akademi University, Turku, 20520, Finland
| | - Vladimir N Uversky
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region, 142290, Russia; Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.
| | - Eugene A Permyakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region, 142290, Russia; Department of Biomedical Engineering, Pushchino State Institute of Natural Sciences, Pushchino, Moscow region, 142290, Russia
| |
Collapse
|
35
|
Perdigão N, Rosa A. Dark Proteome Database: Studies on Dark Proteins. High Throughput 2019; 8:ht8020008. [PMID: 30934744 PMCID: PMC6630768 DOI: 10.3390/ht8020008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 03/12/2019] [Accepted: 03/15/2019] [Indexed: 12/27/2022] Open
Abstract
The dark proteome, as we define it, is the part of the proteome where 3D structure has not been observed either by homology modeling or by experimental characterization in the protein universe. From the 550.116 proteins available in Swiss-Prot (as of July 2016), 43.2% of the eukarya universe and 49.2% of the virus universe are part of the dark proteome. In bacteria and archaea, the percentage of the dark proteome presence is significantly less, at 12.6% and 13.3% respectively. In this work, we present a necessary step to complete the dark proteome picture by introducing the map of the dark proteome in the human and in other model organisms of special importance to mankind. The most significant result is that around 40% to 50% of the proteome of these organisms are still in the dark, where the higher percentages belong to higher eukaryotes (mouse and human organisms). Due to the amount of darkness present in the human organism being more than 50%, deeper studies were made, including the identification of ‘dark’ genes that are responsible for the production of so-called dark proteins, as well as the identification of the ‘dark’ tissues where dark proteins are over represented, namely, the heart, cervical mucosa, and natural killer cells. This is a step forward in the direction of gaining a deeper knowledge of the human dark proteome.
Collapse
Affiliation(s)
- Nelson Perdigão
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
- Instituto de Sistemas e Robótica, 1049-001 Lisbon, Portugal.
| | - Agostinho Rosa
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
- Instituto de Sistemas e Robótica, 1049-001 Lisbon, Portugal.
| |
Collapse
|
36
|
Vincent M, Uversky VN, Schnell S. On the Need to Develop Guidelines for Characterizing and Reporting Intrinsic Disorder in Proteins. Proteomics 2019; 19:e1800415. [PMID: 30793871 PMCID: PMC6571172 DOI: 10.1002/pmic.201800415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 02/05/2019] [Indexed: 01/02/2023]
Abstract
Since the early 2000s, numerous computational tools have been created and used to predict intrinsic disorder in proteins. At present, the output from these algorithms is difficult to interpret in the absence of standards or references for comparison. There are many reasons to establish a set of standard-based guidelines to evaluate computational protein disorder predictions. This viewpoint explores a handful of these reasons, including standardizing nomenclature to improve communication, rigor and reproducibility, and making it easier for newcomers to enter the field. An approach for reporting predicted disorder in single proteins with respect to whole proteomes is discussed. The suggestions are not intended to be formulaic; they should be viewed as a starting point to establish guidelines for interpreting and reporting computational protein disorder predictions.
Collapse
Affiliation(s)
- Michael Vincent
- Interdisciplinary Biological Sciences, Northwestern University, Evanston, Illinois 60208, USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, USA
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino 142290, Moscow region, Russia
| | - Santiago Schnell
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Michigan 48109, USA
| |
Collapse
|
37
|
Oldfield CJ, Uversky VN, Dunker AK, Kurgan L. Introduction to intrinsically disordered proteins and regions. Proteins 2019. [DOI: 10.1016/b978-0-12-816348-1.00001-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
38
|
Gagliardi PA, Primo L. Irreversible Activation of Rho-activated Kinases Resulted from Evolution of Proteolytic Sites within Disordered Regions in Coiled-coil Domain. Mol Biol Evol 2018; 36:376-392. [DOI: 10.1093/molbev/msy229] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Affiliation(s)
- Paolo Armando Gagliardi
- Department of Biology, Institute of Cell Biology, University of Bern, Baltzerstrasse 4, Bern, Switzerland
- Candiolo Cancer Institute-FPO IRCCS, Candiolo, Italy
| | - Luca Primo
- Candiolo Cancer Institute-FPO IRCCS, Candiolo, Italy
- Department of Oncology, University of Torino, Turin, Italy
| |
Collapse
|
39
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
40
|
Meng F, Kurgan L. High‐throughput prediction of disordered moonlighting regions in protein sequences. Proteins 2018; 86:1097-1110. [DOI: 10.1002/prot.25590] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 07/25/2018] [Accepted: 08/05/2018] [Indexed: 01/20/2023]
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering University of Alberta Edmonton Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering University of Alberta Edmonton Canada
- Department of Computer Science Virginia Commonwealth University Richmond VA
| |
Collapse
|
41
|
Yruela I, Contreras-Moreira B, Dunker AK, Niklas KJ. Evolution of Protein Ductility in Duplicated Genes of Plants. FRONTIERS IN PLANT SCIENCE 2018; 9:1216. [PMID: 30177944 PMCID: PMC6109787 DOI: 10.3389/fpls.2018.01216] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 07/30/2018] [Indexed: 06/08/2023]
Abstract
Previous work has shown that ductile/intrinsically disordered proteins (IDPs) and residues (IDRs) are found in all unicellular and multicellular organisms, wherein they are essential for basic cellular functions and complement the function of rigid proteins. In addition, computational studies of diverse phylogenetic lineages have revealed: (1) that protein ductility increases in concert with organismic complexity, and (2) that distributions of IDPs and IDRs along the chromosomes of plant species are non-random and correlate with variations in the rates of the genetic recombination and chromosomal rearrangement. Here, we show that approximately 50% of aligned residues in paralogs across a spectrum of algae, bryophytes, monocots, and eudicots are IDRs and that a high proportion (ca. 60%) are in disordered segments greater than 30 residues. When three types of IDRs are distinguished (i.e., identical, similar and variable IDRs) we find that species with large numbers of chromosome and endoduplicated genes exhibit paralogous sequences with a higher frequency of identical IDRs, whereas species with small chromosomes numbers exhibit paralogous sequences with a higher frequency of similar and variable IDRs. These results are interpreted to indicate that genome duplication events influence the distribution of IDRs along protein sequences and likely favor the presence of identical IDRs (compared to similar IDRs or variable IDRs). We discuss the evolutionary implications of gene duplication events in the context of ductile/disordered residues and segments, their conservation, and their effects on functionality.
Collapse
Affiliation(s)
- Inmaculada Yruela
- Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
- Group of Biochemistry, Biophysics and Computational Biology, Joint Unit to CSIC, Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain
| | - Bruno Contreras-Moreira
- Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
- Group of Biochemistry, Biophysics and Computational Biology, Joint Unit to CSIC, Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain
- Fundación Agencia Aragonesa para la Investigación y el Desarrollo (ARAID), Zaragoza, Spain
| | - A. Keith Dunker
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Karl J. Niklas
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
42
|
Zhao Z, Peng Z, Yang J. Improving Sequence-Based Prediction of Protein–Peptide Binding Residues by Introducing Intrinsic Disorder and a Consensus Method. J Chem Inf Model 2018; 58:1459-1468. [DOI: 10.1021/acs.jcim.8b00019] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Zijuan Zhao
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
43
|
Gao J, Yang Y, Zhou Y. Grid-based prediction of torsion angle probabilities of protein backbone and its application to discrimination of protein intrinsic disorder regions and selection of model structures. BMC Bioinformatics 2018; 19:29. [PMID: 29390958 PMCID: PMC5796405 DOI: 10.1186/s12859-018-2031-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/17/2018] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Protein structure can be described by backbone torsion angles: rotational angles about the N-Cα bond (φ) and the Cα-C bond (ψ) or the angle between Cαi-1-Cαi-Cαi + 1 (θ) and the rotational angle about the Cαi-Cαi + 1 bond (τ). Thus, their accurate prediction is useful for structure prediction and model refinement. Early methods predicted torsion angles in a few discrete bins whereas most recent methods have focused on prediction of angles in real, continuous values. Real value prediction, however, is unable to provide the information on probabilities of predicted angles. RESULTS Here, we propose to predict angles in fine grids of 5° by using deep learning neural networks. We found that this grid-based technique can yield 2-6% higher accuracy in predicting angles in the same 5° bin than existing prediction techniques compared. We further demonstrate the usefulness of predicted probabilities at given angle bins in discrimination of intrinsically disorder regions and in selection of protein models. CONCLUSIONS The proposed method may be useful for characterizing protein structure and disorder. The method is available at http://sparks-lab.org/server/SPIDER2/ as a part of SPIDER2 package.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071 People’s Republic of China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510000 People’s Republic of China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222 Australia
| |
Collapse
|
44
|
Carugo O. Hydrophobicity diversity in globular and nonglobular proteins measured with the Gini index. Protein Eng Des Sel 2017; 30:781-784. [DOI: 10.1093/protein/gzx060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 11/14/2017] [Indexed: 01/23/2023] Open
Affiliation(s)
- Oliviero Carugo
- Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, 1030 Vienna, Austria, and Department of Chemistry, University of Pavia, viale Taramelli 12, 27100 Pavia, Italy
| |
Collapse
|
45
|
Uversky VN. Intrinsic Disorder, Protein-Protein Interactions, and Disease. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2017; 110:85-121. [PMID: 29413001 DOI: 10.1016/bs.apcsb.2017.06.005] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
It is recognized now that biologically active proteins without stable tertiary structure (known as intrinsically disordered proteins, IDPs) and hybrid proteins containing ordered domains and intrinsically disordered protein regions (IDPRs) are important players found in any given proteome. These IDPs/IDPRs possess functions that complement functional repertoire of their ordered counterparts, being commonly related to recognition, as well as control and regulation of various signaling pathways. They are interaction masters, being able to utilize a wide spectrum of interaction mechanisms, ranging from induced folding to formation of fuzzy complexes where significant levels of disorder are preserved, to polyvalent stochastic interactions playing crucial roles in the liquid-liquid phase transitions leading to the formation of proteinaceous membrane-less organelles. IDPs/IDPRs are tightly controlled themselves via various means, including alternative splicing, precisely controlled expression and degradation, binding to specific partners, and posttranslational modifications. Distortions in the regulation and control of IDPs/IDPRs, as well as their aberrant interactivity are commonly associated with various human diseases. This review presents some aspects of the intrinsic disorder-based functionality and dysfunctionality, paying special attention to the normal and pathological protein-protein interactions.
Collapse
Affiliation(s)
- Vladimir N Uversky
- USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, United States; Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russia.
| |
Collapse
|