351
|
Wu Z, Hu G, Yang J, Peng Z, Uversky VN, Kurgan L. In various protein complexes, disordered protomers have large per-residue surface areas and area of protein-, DNA- and RNA-binding interfaces. FEBS Lett 2015; 589:2561-9. [DOI: 10.1016/j.febslet.2015.08.014] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Revised: 07/31/2015] [Accepted: 08/03/2015] [Indexed: 11/28/2022]
|
352
|
Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies. Int J Mol Sci 2015; 16:19040-54. [PMID: 26287166 PMCID: PMC4581285 DOI: 10.3390/ijms160819040] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 07/15/2015] [Accepted: 08/04/2015] [Indexed: 12/13/2022] Open
Abstract
The role and function of a given protein is dependent on its structure. In recent years, however, numerous studies have highlighted the importance of unstructured, or disordered regions in governing a protein’s function. Disordered proteins have been found to play important roles in pivotal cellular functions, such as DNA binding and signalling cascades. Studying proteins with extended disordered regions is often problematic as they can be challenging to express, purify and crystallise. This means that interpretable experimental data on protein disorder is hard to generate. As a result, predictive computational tools have been developed with the aim of predicting the level and location of disorder within a protein. Currently, over 60 prediction servers exist, utilizing different methods for classifying disorder and different training sets. Here we review several good performing, publicly available prediction methods, comparing their application and discussing how disorder prediction servers can be used to aid the experimental solution of protein structure. The use of disorder prediction methods allows us to adopt a more targeted approach to experimental studies by accurately identifying the boundaries of ordered protein domains so that they may be investigated separately, thereby increasing the likelihood of their successful experimental solution.
Collapse
|
353
|
Tusnády GE, Dobson L, Tompa P. Disordered regions in transmembrane proteins. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2015; 1848:2839-48. [PMID: 26275590 DOI: 10.1016/j.bbamem.2015.08.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 07/28/2015] [Accepted: 08/09/2015] [Indexed: 11/18/2022]
Abstract
The functions of transmembrane proteins in living cells are widespread; they range from various transport processes to energy production, from cell-cell adhesion to communication. Structurally, they are highly ordered in their membrane-spanning regions, but may contain disordered regions in the cytosolic and extra-cytosolic parts. In this study, we have investigated the disordered regions in transmembrane proteins by a stringent definition of disordered residues on the currently available largest experimental dataset, and show a significant correlation between the spatial distributions of positively charged residues and disordered regions. This finding suggests a new role of disordered regions in transmembrane proteins by providing structural flexibility for stabilizing interactions with negatively charged head groups of the lipid molecules. We also find a preference of structural disorder in the terminal--as opposed to loop--regions in transmembrane proteins, and survey the respective functions involved in recruiting other proteins or mediating allosteric signaling effects. Finally, we critically compare disorder prediction methods on our transmembrane protein set. While there are no major differences between these methods using the usual statistics, such as per residue accuracies, Matthew's correlation coefficients, etc.; substantial differences can be found regarding the spatial distribution of the predicted disordered regions. We conclude that a predictor optimized for transmembrane proteins would be of high value to the field of structural disorder.
Collapse
Affiliation(s)
- Gábor E Tusnády
- Institute of Enzymology, RCNS, HAS, Magyar Tudósok körútja 2, 1117 Budapest, Hungary.
| | - László Dobson
- Institute of Enzymology, RCNS, HAS, Magyar Tudósok körútja 2, 1117 Budapest, Hungary
| | - Peter Tompa
- Institute of Enzymology, RCNS, HAS, Magyar Tudósok körútja 2, 1117 Budapest, Hungary; VIB Structural Biology Research Center, VUB, Building E, Pleinlaan 2, 1050 Brussels, Belgium
| |
Collapse
|
354
|
Varadi M, Vranken W, Guharoy M, Tompa P. Computational approaches for inferring the functions of intrinsically disordered proteins. Front Mol Biosci 2015; 2:45. [PMID: 26301226 PMCID: PMC4525029 DOI: 10.3389/fmolb.2015.00045] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 07/21/2015] [Indexed: 01/09/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) are ubiquitously involved in cellular processes and often implicated in human pathological conditions. The critical biological roles of these proteins, despite not adopting a well-defined fold, encouraged structural biologists to revisit their views on the protein structure-function paradigm. Unfortunately, investigating the characteristics and describing the structural behavior of IDPs is far from trivial, and inferring the function(s) of a disordered protein region remains a major challenge. Computational methods have proven particularly relevant for studying IDPs: on the sequence level their dependence on distinct characteristics determined by the local amino acid context makes sequence-based prediction algorithms viable and reliable tools for large scale analyses, while on the structure level the in silico integration of fundamentally different experimental data types is essential to describe the behavior of a flexible protein chain. Here, we offer an overview of the latest developments and computational techniques that aim to uncover how protein function is connected to intrinsic disorder.
Collapse
Affiliation(s)
- Mihaly Varadi
- Flemish Institute of Biotechnology Brussels, Belgium ; Department of Structural Biology, VIB, Vrije Universiteit Brussels Brussels, Belgium
| | - Wim Vranken
- Flemish Institute of Biotechnology Brussels, Belgium ; Department of Structural Biology, VIB, Vrije Universiteit Brussels Brussels, Belgium ; ULB-VUB - Interuniversity Institute of Bioinformatics in Brussels (IB)2 Brussels, Belgium
| | - Mainak Guharoy
- Flemish Institute of Biotechnology Brussels, Belgium ; Department of Structural Biology, VIB, Vrije Universiteit Brussels Brussels, Belgium
| | - Peter Tompa
- Flemish Institute of Biotechnology Brussels, Belgium ; Department of Structural Biology, VIB, Vrije Universiteit Brussels Brussels, Belgium
| |
Collapse
|
355
|
Minervini G, Quaglia F, Tosatto SCE. Insights into the proline hydroxylase (PHD) family, molecular evolution and its impact on human health. Biochimie 2015; 116:114-24. [PMID: 26187473 DOI: 10.1016/j.biochi.2015.07.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 07/12/2015] [Indexed: 12/18/2022]
Abstract
PHDs (proline hydroxylases) are a small protein family found in all organisms, considered the central regulator of the molecular hypoxia response due to PHDs being completely inactivated under low oxygen concentration. At physiological oxygen concentration, PHDs drive the degradation of the HIF-1α (hypoxia-inducible factor 1-α), which is responsible for upregulating the expression of genes involved in the cellular response to hypoxia. Hypoxia is a common feature of most tumors, in particular during metastasis development. Indeed, cancer reacts by activating pathways promoting new blood vessel formation and activating strategies aimed to improve survival. In this scenario, the PHD family regulates the activation of HIF-1α and cell-cycle regulation. Several PHD mutations were found in cancer patients, underlining their importance for human health. Here, we propose a Bayesian model able to predict the pathological effect of human PHD mutations and their correlation with cancer outcome. The model was developed through an integrative in silico approach, where data collected from the literature has been coupled with sequence evolution and structural analysis. The model was used to assess 135 human PHD variants. Finally, bioinformatics characterization was used to demonstrate how few amino acid changes are able to explain the functional specialization of PHD family members and their physiological role in human health.
Collapse
Affiliation(s)
- Giovanni Minervini
- Department of Biomedical Sciences, University of Padua, Viale G. Colombo 3, 35121, Padova, Italy
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, Viale G. Colombo 3, 35121, Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Viale G. Colombo 3, 35121, Padova, Italy.
| |
Collapse
|
356
|
An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions. Int J Mol Sci 2015. [PMID: 26198229 PMCID: PMC4519904 DOI: 10.3390/ijms160715384] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.
Collapse
|
357
|
Wang Z, Yang Q, Li T, Cong P. DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach. PLoS One 2015; 10:e0128334. [PMID: 26090958 PMCID: PMC4474717 DOI: 10.1371/journal.pone.0128334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2014] [Accepted: 04/26/2015] [Indexed: 11/21/2022] Open
Abstract
The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database.
Collapse
Affiliation(s)
- Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Qianqian Yang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (T-HL); (P-SC)
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (T-HL); (P-SC)
| |
Collapse
|
358
|
Identifying Similar Patterns of Structural Flexibility in Proteins by Disorder Prediction and Dynamic Programming. Int J Mol Sci 2015; 16:13829-49. [PMID: 26086829 PMCID: PMC4490526 DOI: 10.3390/ijms160613829] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 06/03/2015] [Accepted: 06/05/2015] [Indexed: 12/31/2022] Open
Abstract
Computational methods are prevailing in identifying protein intrinsic disorder. The results from predictors are often given as per-residue disorder scores. The scores describe the disorder propensity of amino acids of a protein and can be further represented as a disorder curve. Many proteins share similar patterns in their disorder curves. The similar patterns are often associated with similar functions and evolutionary origins. Therefore, finding and characterizing specific patterns of disorder curves provides a unique and attractive perspective of studying the function of intrinsically disordered proteins. In this study, we developed a new computational tool named IDalign using dynamic programming. This tool is able to identify similar patterns among disorder curves, as well as to present the distribution of intrinsic disorder in query proteins. The disorder-based information generated by IDalign is significantly different from the information retrieved from classical sequence alignments. This tool can also be used to infer functions of disordered regions and disordered proteins. The web server of IDalign is available at (http://labs.cas.usf.edu/bioinfo/service.html).
Collapse
|
359
|
Faust O, Bigman L, Friedler A. A role of disordered domains in regulating protein oligomerization and stability. Chem Commun (Camb) 2015; 50:10797-800. [PMID: 25054624 DOI: 10.1039/c4cc03863k] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Intrinsically disordered proteins (IDPs) or regions (IDRs) in proteins hold many functions but their biological roles are still not fully understood. Here we describe a new role of such regions. Using the HIV-1 Rev protein, we show that disordered domains have a role in maintaining the correct oligomeric state and the thermodynamic stability of proteins.
Collapse
Affiliation(s)
- Ofrah Faust
- Institute of Chemistry, The Hebrew University of Jerusalem, The Edmond J. Safra Campus, Givat Ram 91904, Jerusalem, Israel.
| | | | | |
Collapse
|
360
|
Varadi M, Guharoy M, Zsolyomi F, Tompa P. DisCons: a novel tool to quantify and classify evolutionary conservation of intrinsic protein disorder. BMC Bioinformatics 2015; 16:153. [PMID: 25968230 PMCID: PMC4427981 DOI: 10.1186/s12859-015-0592-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 04/23/2015] [Indexed: 12/03/2022] Open
Abstract
Background Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper the study of functionality, because the conservation of their disorder profile and ensuing function(s) may not appear in a traditional analysis of the evolutionary history of the protein. Results Here we present DisCons (Disorder Conservation), a novel pipelined tool that combines the quantification of sequence- and disorder conservation to classify disordered residue positions. According to this scheme, the most interesting categories (for functional purposes) are constrained disordered residues and flexible disordered residues. The former residues show conservation of both the sequence and the property of disorder and are associated mainly with specific binding functionalities (e.g., short, linear motifs, SLiMs), whereas the latter class correspond to segments where disorder as a feature is important for function as opposed to the identity of the underlying sequence (e.g., entropic chains and linkers). DisCons therefore helps with elucidating the function(s) arising from the disordered state by analyzing individual proteins as well as large-scale proteomics datasets. Conclusions DisCons is an openly accessible sequence analysis tool that identifies and highlights structurally disordered segments of proteins where the conformational flexibility is conserved across homologs, and therefore potentially functional. The tool is freely available both as a web application and as stand-alone source code hosted at http://pedb.vib.be/discons.
Collapse
Affiliation(s)
- Mihaly Varadi
- VIB Structural Biology Research Center (SBRC), Brussels, Belgium. .,Vrije Universiteit Brussel, Brussels, Belgium.
| | - Mainak Guharoy
- VIB Structural Biology Research Center (SBRC), Brussels, Belgium. .,Vrije Universiteit Brussel, Brussels, Belgium.
| | | | - Peter Tompa
- VIB Structural Biology Research Center (SBRC), Brussels, Belgium. .,Vrije Universiteit Brussel, Brussels, Belgium. .,Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.
| |
Collapse
|
361
|
Varadi M, Guharoy M, Zsolyomi F, Tompa P. DisCons: a novel tool to quantify and classify evolutionary conservation of intrinsic protein disorder. BMC Bioinformatics 2015. [PMID: 25968230 DOI: 10.1186/s12859‐015‐0592‐2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper the study of functionality, because the conservation of their disorder profile and ensuing function(s) may not appear in a traditional analysis of the evolutionary history of the protein. RESULTS Here we present DisCons (Disorder Conservation), a novel pipelined tool that combines the quantification of sequence- and disorder conservation to classify disordered residue positions. According to this scheme, the most interesting categories (for functional purposes) are constrained disordered residues and flexible disordered residues. The former residues show conservation of both the sequence and the property of disorder and are associated mainly with specific binding functionalities (e.g., short, linear motifs, SLiMs), whereas the latter class correspond to segments where disorder as a feature is important for function as opposed to the identity of the underlying sequence (e.g., entropic chains and linkers). DisCons therefore helps with elucidating the function(s) arising from the disordered state by analyzing individual proteins as well as large-scale proteomics datasets. CONCLUSIONS DisCons is an openly accessible sequence analysis tool that identifies and highlights structurally disordered segments of proteins where the conformational flexibility is conserved across homologs, and therefore potentially functional. The tool is freely available both as a web application and as stand-alone source code hosted at http://pedb.vib.be/discons .
Collapse
Affiliation(s)
- Mihaly Varadi
- VIB Structural Biology Research Center (SBRC), Brussels, Belgium. .,Vrije Universiteit Brussel, Brussels, Belgium.
| | - Mainak Guharoy
- VIB Structural Biology Research Center (SBRC), Brussels, Belgium. .,Vrije Universiteit Brussel, Brussels, Belgium.
| | | | - Peter Tompa
- VIB Structural Biology Research Center (SBRC), Brussels, Belgium. .,Vrije Universiteit Brussel, Brussels, Belgium. .,Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.
| |
Collapse
|
362
|
Uversky VN. Unreported intrinsic disorder in proteins: Disorder emergency room. INTRINSICALLY DISORDERED PROTEINS 2015; 3:e1010999. [PMID: 28232885 DOI: 10.1080/21690707.2015.1010999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 01/01/2014] [Accepted: 11/24/2014] [Indexed: 10/23/2022]
Abstract
This article continues an "Unreported Intrinsic Disorder in Proteins" series, the goal of which is to expose some interesting cases of missed (or overlooked, or ignored) disorder in proteins. The need for this series is justified by the observation that despite the fact that protein intrinsic disorder is widely accepted by the scientific community, there are still numerous instances when appreciation of this phenomenon is absent. This results in the avalanche of research papers which are talking about intrinsically disordered proteins (or hybrid proteins with ordered and disordered regions) not recognizing that they are talking about such proteins. Articles in the "Unreported Intrinsic Disorder in Proteins" series provide a fast fix for some of the recent noticeable disorder overlooks.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer Research Institute; Morsani College of Medicine, University of South Florida; Tampa, FL USA; Biology Department; Faculty of Science; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia; Laboratory of Structural Dynamics; Stability and Folding of Proteins; Institute of Cytology; Russian Academy of Sciences; St. Petersburg, Russia
| |
Collapse
|
363
|
Chetty S, Bhakat S, Martin AJM, Soliman MES. Multi-drug resistance profile of PR20 HIV-1 protease is attributed to distorted conformational and drug binding landscape: molecular dynamics insights. J Biomol Struct Dyn 2015; 34:135-51. [PMID: 25671669 DOI: 10.1080/07391102.2015.1018326] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The PR20 HIV-1 protease, a variant with 20 mutations, exhibits high levels of multi-drug resistance; however, to date, there has been no report detailing the impact of these 20 mutations on the conformational and drug binding landscape at a molecular level. In this report, we demonstrate the first account of a comprehensive study designed to elaborate on the impact of these mutations on the dynamic features as well as drug binding and resistance profile, using extensive molecular dynamics analyses. Comparative MD simulations for the wild-type and PR20 HIV proteases, starting from bound and unbound conformations in each case, were performed. Results showed that the apo conformation of the PR20 variant of the HIV protease displayed a tendency to remain in the open conformation for a longer period of time when compared to the wild type. This led to a phenomena in which the inhibitor seated at the active site of PR20 tends to diffuse away from the binding site leading to a significant change in inhibitor-protein association. Calculating the per-residue fluctuation (RMSF) and radius of gyration, further validated these findings. MM/GBSA showed that the occurrence of 20 mutations led to a drop in the calculated binding free energies (ΔGbind) by ~25.17 kcal/mol and ~5 kcal/mol for p2-NC, a natural peptide substrate, and darunavir, respectively, when compared to wild type. Furthermore, the residue interaction network showed a diminished inter-residue hydrogen bond network and changes in inter-residue connections as a result of these mutations. The increased conformational flexibility in PR20 as a result of loss of intra- and inter-molecular hydrogen bond interactions and other prominent binding forces led to a loss of protease grip on ligand. It is interesting to note that the difference in conformational flexibility between PR20 and WT conformations was much higher in the case of substrate-bound conformation as compared to DRV. Thus, developing analogues of DRV by retaining its key pharmacophore features will be the way forward in the search for novel protease inhibitors against multi-drug resistant strains.
Collapse
Affiliation(s)
- Sarentha Chetty
- a Molecular Modelling and Drug Design Research Group, School of Health Sciences , University of Kwazulu-Natal , Westville, Durban 4000 , South Africa
| | - Soumendranath Bhakat
- a Molecular Modelling and Drug Design Research Group, School of Health Sciences , University of Kwazulu-Natal , Westville, Durban 4000 , South Africa
| | - Alberto J M Martin
- b Computational Biology Lab, Fundación Ciencia & Vida , Santiago , Chile.,c Facultad de Ciencias, Centro Interdisciplinario de Neurociencia de Valparaíso , Universidad de Valparaíso , Valparaíso , Chile
| | - Mahmoud E S Soliman
- a Molecular Modelling and Drug Design Research Group, School of Health Sciences , University of Kwazulu-Natal , Westville, Durban 4000 , South Africa
| |
Collapse
|
364
|
Frege T, Uversky VN. Intrinsically disordered proteins in the nucleus of human cells. Biochem Biophys Rep 2015; 1:33-51. [PMID: 29124132 PMCID: PMC5668563 DOI: 10.1016/j.bbrep.2015.03.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Accepted: 03/11/2015] [Indexed: 12/16/2022] Open
Abstract
Intrinsically disordered proteins are known to perform a variety of important functions such as macromolecular recognition, promiscuous binding, and signaling. They are crucial players in various cellular pathway and processes, where they often have key regulatory roles. Among vital cellular processes intimately linked to the intrinsically disordered proteins is transcription, an intricate biological performance predominantly developing inside the cell nucleus. With this work, we gathered information about proteins that exist in various compartments and sub-nuclear bodies of the nucleus of the human cells, with the goal of identifying which ones are highly disordered and which functions are ascribed to the disordered nuclear proteins.
Collapse
Affiliation(s)
- Telma Frege
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
- GenomeNext LLC, 175 South 3rd Street, Suite 200, Columbus OH 43215, USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
- USF Health Byrd Alzheimer׳s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
- Department of Biology, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
- Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia
- Correspondence to: Department of Molecular, Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Boulevard, MDC07, Tampa, FL 33612, USA. Tel.: +1 813 974 5816; fax: +1 813 974 7357.
| |
Collapse
|
365
|
Abstract
Intrinsically disordered proteins (IDPs) are important components of the cellular signalling machinery, allowing the same polypeptide to undertake different interactions with different consequences. IDPs are subject to combinatorial post-translational modifications and alternative splicing, adding complexity to regulatory networks and providing a mechanism for tissue-specific signalling. These proteins participate in the assembly of signalling complexes and in the dynamic self-assembly of membrane-less nuclear and cytoplasmic organelles. Experimental, computational and bioinformatic analyses combine to identify and characterize disordered regions of proteins, leading to a greater appreciation of their widespread roles in biological processes.
Collapse
|
366
|
Liu S, Cai X, Wu J, Cong Q, Chen X, Li T, Du F, Ren J, Wu YT, Grishin NV, Chen ZJ. Phosphorylation of innate immune adaptor proteins MAVS, STING, and TRIF induces IRF3 activation. Science 2015; 347:aaa2630. [PMID: 25636800 DOI: 10.1126/science.aaa2630] [Citation(s) in RCA: 1376] [Impact Index Per Article: 137.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
During virus infection, the adaptor proteins MAVS and STING transduce signals from the cytosolic nucleic acid sensors RIG-I and cGAS, respectively, to induce type I interferons (IFNs) and other antiviral molecules. Here we show that MAVS and STING harbor two conserved serine and threonine clusters that are phosphorylated by the kinases IKK and/or TBK1 in response to stimulation. Phosphorylated MAVS and STING then bind to a positively charged surface of interferon regulatory factor 3 (IRF3) and thereby recruit IRF3 for its phosphorylation and activation by TBK1. We further show that TRIF, an adaptor protein in Toll-like receptor signaling, activates IRF3 through a similar phosphorylation-dependent mechanism. These results reveal that phosphorylation of innate adaptor proteins is an essential and conserved mechanism that selectively recruits IRF3 to activate the type I IFN pathway.
Collapse
Affiliation(s)
- Siqi Liu
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Xin Cai
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Jiaxi Wu
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Qian Cong
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Xiang Chen
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA. Howard Hughes Medical Institute (HHMI), University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Tuo Li
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Fenghe Du
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA. Howard Hughes Medical Institute (HHMI), University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Junyao Ren
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - You-Tong Wu
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Nick V Grishin
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA. Howard Hughes Medical Institute (HHMI), University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA
| | - Zhijian J Chen
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA. Howard Hughes Medical Institute (HHMI), University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, USA.
| |
Collapse
|
367
|
Singh GP. Association between intrinsic disorder and serine/threonine phosphorylation in Mycobacterium tuberculosis. PeerJ 2015; 3:e724. [PMID: 25648268 PMCID: PMC4304846 DOI: 10.7717/peerj.724] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 12/21/2014] [Indexed: 01/28/2023] Open
Abstract
Serine/threonine phosphorylation is an important mechanism that is involved in the regulation of protein function. In eukaryotes, phosphorylation occurs predominantly in intrinsically disordered regions of proteins. Though serine/threonine phosphorylation and protein disorder are much less prevalent in prokaryotes, some bacteria have high levels of serine/threonine phosphorylation and disorder, including the medically important M. tuberculosis. Here I show that serine/threonine phosphorylation sites in M. tuberculosis are highly enriched in intrinsically disordered regions, indicating similarity in the substrate recognition mechanisms of eukaryotic and M. tuberculosis kinases. Serine/threonine phosphorylation has been linked to the pathogenicity and survival of M. tuberculosis. Thus, a better understanding of how its kinases recognize their substrates could have important implications in understanding and controlling the biology of this deadly pathogen. These results also indicate that the association between serine/threonine phosphorylation and disorder is not a feature restricted to eukaryotes.
Collapse
Affiliation(s)
- Gajinder Pal Singh
- School of Biotechnology, KIIT University , Patia, Bhubaneswar, Odisha , India
| |
Collapse
|
368
|
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) do not adopt a well-defined folded structure under physiological conditions. Instead, these proteins exist as heterogeneous and dynamical conformational ensembles. IDPs are widespread in eukaryotic proteomes and are involved in fundamental biological processes, mostly related to regulation and signaling. At the same time, disordered regions often pose significant challenges to the structure determination process, which generally requires highly homogeneous proteins samples. In this book chapter, we provide a brief overview of protein disorder, describe various bioinformatics resources that have been developed in recent years for their characterization, and give a general outline of their applications in various types of structural genomics projects. Traditionally, disordered segments were filtered out to optimize the yield of structure determination pipelines. However, it is becoming increasingly clear that the structural characterization of proteins cannot be complete without the incorporation of intrinsically disordered regions.
Collapse
Affiliation(s)
- Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | |
Collapse
|
369
|
Peng Z, Yan J, Fan X, Mizianty MJ, Xue B, Wang K, Hu G, Uversky VN, Kurgan L. Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol Life Sci 2015; 72:137-51. [PMID: 24939692 PMCID: PMC11113594 DOI: 10.1007/s00018-014-1661-9] [Citation(s) in RCA: 297] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 05/29/2014] [Accepted: 05/30/2014] [Indexed: 02/02/2023]
Abstract
Recent years witnessed increased interest in intrinsically disordered proteins and regions. These proteins and regions are abundant and possess unique structural features and a broad functional repertoire that complements ordered proteins. However, modern studies on the abundance and functions of intrinsically disordered proteins and regions are relatively limited in size and scope of their analysis. To fill this gap, we performed a broad and detailed computational analysis of over 6 million proteins from 59 archaea, 471 bacterial, 110 eukaryotic and 325 viral proteomes. We used arguably more accurate consensus-based disorder predictions, and for the first time comprehensively characterized intrinsic disorder at proteomic and protein levels from all significant perspectives, including abundance, cellular localization, functional roles, evolution, and impact on structural coverage. We show that intrinsic disorder is more abundant and has a unique profile in eukaryotes. We map disorder into archaea, bacterial and eukaryotic cells, and demonstrate that it is preferentially located in some cellular compartments. Functional analysis that considers over 1,200 annotations shows that certain functions are exclusively implemented by intrinsically disordered proteins and regions, and that some of them are specific to certain domains of life. We reveal that disordered regions are often targets for various post-translational modifications, but primarily in the eukaryotes and viruses. Using a phylogenetic tree for 14 eukaryotic and 112 bacterial species, we analyzed relations between disorder, sequence conservation and evolutionary speed. We provide a complete analysis that clearly shows that intrinsic disorder is exceptionally and uniquely abundant in each domain of life.
Collapse
Affiliation(s)
- Zhenling Peng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Xiao Fan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Marcin J. Mizianty
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, College of Fine Arts and Sciences, University of South Florida, 33612 Tampa, USA
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Vladimir N. Uversky
- Department of Molecular Medicine, Byrd Alzheimer’s Research Institute, College of Medicine, University of South Florida, 33612 Tampa, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, Moscow Region, 142290 Pushchino, Russia
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| |
Collapse
|
370
|
Spencer M, Eickholt J, Cheng J. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:103-12. [PMID: 25750595 PMCID: PMC4348072 DOI: 10.1109/tcbb.2014.2343960] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.
Collapse
Affiliation(s)
- Matt Spencer
- Informatics Institute, University of Missouri, Columbia, MO 65211.
| | - Jesse Eickholt
- Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859.
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211.
| |
Collapse
|
371
|
A critical evaluation of in silico methods for detection of membrane protein intrinsic disorder. Biophys J 2014; 106:1638-49. [PMID: 24739163 DOI: 10.1016/j.bpj.2014.02.025] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Revised: 02/03/2014] [Accepted: 02/25/2014] [Indexed: 11/23/2022] Open
Abstract
Intrinsically disordered regions in proteins possess important biological roles including transcriptional regulation, molecular recognition, and provision of sites for posttranslational modification. In three-dimensional crystallization of both soluble and membrane proteins, identification and removal of disordered regions is often necessary for obtaining crystals possessing sufficient long-range order for structure determination. Disordered regions can be identified experimentally, with techniques such as limited proteolysis coupled with mass spectrometry, or computationally, by using disorder prediction programs, of which many are available. Although these programs use various methods to predict disorder from a protein's primary sequence, they all were developed using information derived from soluble protein structures. Therefore, their performance and accuracy when applied to integral membrane proteins remained an open question. We evaluated the performance of 13 disorder prediction programs on a dataset containing 343 membrane proteins, and upon subdatasets containing only α-helical or β-barrel proteins. These programs were ranked using multiple metrics, including metrics specifically created for membrane proteins. Analysis of these data shows a clear distinction between programs that accurately predict disordered regions in membrane proteins and programs which perform poorly, and allows for the robust integration of in silico disorder prediction into our PSI:Biology membrane protein structural genomics pipeline.
Collapse
|
372
|
Kumari B, Kumar R, Kumar M. Low complexity and disordered regions of proteins have different structural and amino acid preferences. MOLECULAR BIOSYSTEMS 2014; 11:585-94. [PMID: 25468592 DOI: 10.1039/c4mb00425f] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Low complexity regions (LCRs) or non-random regions of a few amino acids are abundantly present in proteins. LCRs are traditionally considered as floppy structures with high solvent accessibility. Thus little attention was paid to them for structural studies. However LCRs have been found to contain information relevant to protein structure and various important functions. The present study is an attempt to understand the structural trend of LCRs. Here we report a study conducted to understand the structural trend, solvent accessibility and amino acid preferences of LCRs. The results show that LCRs might attain any type of secondary structure; however, the helix is frequently seen, whereas sheets occur rarely. We also found that LCRs are not always exposed on the surface. We found insignificant contribution of trans-membrane helices to the overall helix content. The LCRs having a secondary structure have different enrichment and depletion of amino acids from LCRs without a secondary structure and disordered protein sequences. However, LCRs of NMR structures showed compositional and functional similarity to the disordered regions of proteins. We also noted that in ∼3/4 LCRs, the entire amino acid did not have a single structural class, but rather an ensemble of more than one secondary structure, which indicates that they are found at places where structure transition occurs. Overall analysis suggests that the overall protein sequence has a greater influence on the structural and sequence enrichment rather than only the local amino acid composition of LCRs.
Collapse
Affiliation(s)
- Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, New Delhi, India.
| | | | | |
Collapse
|
373
|
Uversky VN, Kuznetsova IM, Turoverov KK, Zaslavsky B. Intrinsically disordered proteins as crucial constituents of cellular aqueous two phase systems and coacervates. FEBS Lett 2014; 589:15-22. [PMID: 25436423 DOI: 10.1016/j.febslet.2014.11.028] [Citation(s) in RCA: 191] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2014] [Revised: 10/10/2014] [Accepted: 11/19/2014] [Indexed: 12/25/2022]
Abstract
Here, we hypothesize that intrinsically disordered proteins (IDPs) serve as important drivers of the intracellular liquid-liquid phase separations that generate various membrane-less organelles. This hypothesis is supported by the overwhelming abundance of IDPs in these organelles. Assembly and disassembly of these organelles are controlled by changes in the concentrations of IDPs, their posttranslational modifications, binding of specific partners, and changes in the pH and/or temperature of the solution. Each resulting phase provides a distinct solvent environment for other solutes leading to their unequal distribution within phases. The specificity and efficiency of such partitioning is determined by the nature of the IDP(s) and defines "targeted" enrichment of specific molecules in the resulting membrane-less organelles that determines their specific activities.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA; Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation; Biology Department, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia; Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russian Federation.
| | - Irina M Kuznetsova
- Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russian Federation; St. Petersburg State Polytechnical University, St. Petersburg, Russian Federation
| | - Konstantin K Turoverov
- Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russian Federation; St. Petersburg State Polytechnical University, St. Petersburg, Russian Federation
| | - Boris Zaslavsky
- AnalizaDx Inc., 3615 Superior Ave., Suite 4407B, Cleveland, OH 44114, USA
| |
Collapse
|
374
|
Drobnak I, Braselmann E, Chaney JL, Leyton DL, Bernstein HD, Lithgow T, Luirink J, Nataro JP, Clark PL. Of linkers and autochaperones: an unambiguous nomenclature to identify common and uncommon themes for autotransporter secretion. Mol Microbiol 2014; 95:1-16. [PMID: 25345653 DOI: 10.1111/mmi.12838] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2014] [Indexed: 01/02/2023]
Abstract
Autotransporter (AT) proteins provide a diverse array of important virulence functions to Gram-negative bacterial pathogens, and have also been adapted for protein surface display applications. The 'autotransporter' moniker refers to early models that depicted these proteins facilitating their own translocation across the bacterial outer membrane. Although translocation is less autonomous than originally proposed, AT protein segments upstream of the C-terminal transmembrane β-barrel have nevertheless consistently been found to contribute to efficient translocation and/or folding of the N-terminal virulence region (the 'passenger'). However, defining the precise secretion functions of these AT regions has been complicated by the use of multiple overlapping and ambiguous terms to define AT sequence, structural, and functional features, including 'autochaperone', 'linker' and 'junction'. Moreover, the precise definitions and boundaries of these features vary among ATs and even among research groups, leading to an overall murky picture of the contributions of specific features to translocation. Here we propose a unified, unambiguous nomenclature for AT structural, functional and conserved sequence features, based on explicit criteria. Applied to 16 well-studied AT proteins, this nomenclature reveals new commonalities for translocation but also highlights that the autochaperone function is less closely associated with a conserved sequence element than previously believed.
Collapse
Affiliation(s)
- Igor Drobnak
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN, 46556, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
375
|
Eickholt J, Wang Z. PCP-ML: protein characterization package for machine learning. BMC Res Notes 2014; 7:810. [PMID: 25406415 PMCID: PMC4246511 DOI: 10.1186/1756-0500-7-810] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 10/31/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine Learning (ML) has a number of demonstrated applications in protein prediction tasks such as protein structure prediction. To speed further development of machine learning based tools and their release to the community, we have developed a package which characterizes several aspects of a protein commonly used for protein prediction tasks with machine learning. FINDINGS A number of software libraries and modules exist for handling protein related data. The package we present in this work, PCP-ML, is unique in its small footprint and emphasis on machine learning. Its primary focus is on characterizing various aspects of a protein through sets of numerical data. The generated data can then be used with machine learning tools and/or techniques. PCP-ML is very flexible in how the generated data is formatted and as a result is compatible with a variety of existing machine learning packages. Given its small size, it can be directly packaged and distributed with community developed tools for protein prediction tasks. CONCLUSIONS Source code and example programs are available under a BSD license at http://mlid.cps.cmich.edu/eickh1jl/tools/PCPML/. The package is implemented in C++ and accessible as a Python module.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859, USA.
| | | |
Collapse
|
376
|
Barman RK, Saha S, Das S. Prediction of interactions between viral and host proteins using supervised machine learning methods. PLoS One 2014; 9:e112034. [PMID: 25375323 PMCID: PMC4223108 DOI: 10.1371/journal.pone.0112034] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 10/11/2014] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Viral-host protein-protein interaction plays a vital role in pathogenesis, since it defines viral infection of the host and regulation of the host proteins. Identification of key viral-host protein-protein interactions (PPIs) has great implication for therapeutics. METHODS In this study, a systematic attempt has been made to predict viral-host PPIs by integrating different features, including domain-domain association, network topology and sequence information using viral-host PPIs from VirusMINT. The three well-known supervised machine learning methods, such as SVM, Naïve Bayes and Random Forest, which are commonly used in the prediction of PPIs, were employed to evaluate the performance measure based on five-fold cross validation techniques. RESULTS Out of 44 descriptors, best features were found to be domain-domain association and methionine, serine and valine amino acid composition of viral proteins. In this study, SVM-based method achieved better sensitivity of 67% over Naïve Bayes (37.49%) and Random Forest (55.66%). However the specificity of Naïve Bayes was the highest (99.52%) as compared with SVM (74%) and Random Forest (89.08%). Overall, the SVM and Random Forest achieved accuracy of 71% and 72.41%, respectively. The proposed SVM-based method was evaluated on blind dataset and attained a sensitivity of 64%, specificity of 83%, and accuracy of 74%. In addition, unknown potential targets of hepatitis B virus-human and hepatitis E virus-human PPIs have been predicted through proposed SVM model and validated by gene ontology enrichment analysis. Our proposed model shows that, hepatitis B virus "C protein" binds to membrane docking protein, while "X protein" and "P protein" interacts with cell-killing and metabolic process proteins, respectively. CONCLUSION The proposed method can predict large scale interspecies viral-human PPIs. The nature and function of unknown viral proteins (HBV and HEV), interacting partners of host protein were identified using optimised SVM model.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Sudipto Saha
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
- * E-mail: (SS); (SD)
| | - Santasabuj Das
- Biomedical Informatics Centre, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
- Division of Clinical Medicine, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
- * E-mail: (SS); (SD)
| |
Collapse
|
377
|
Potenza E, Di Domenico T, Walsh I, Tosatto SCE. MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic Acids Res 2014; 43:D315-20. [PMID: 25361972 PMCID: PMC4384034 DOI: 10.1093/nar/gku982] [Citation(s) in RCA: 155] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions from STRING are also classified for disorder content.
Collapse
Affiliation(s)
- Emilio Potenza
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Tomás Di Domenico
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Ian Walsh
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| |
Collapse
|
378
|
Structural protein reorganization and fold emergence investigated through amino acid sequence permutations. Amino Acids 2014; 47:147-52. [PMID: 25331423 DOI: 10.1007/s00726-014-1849-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 09/29/2014] [Indexed: 10/24/2022]
Abstract
Correlation between random amino acid sequences and protein folds suggests that proteins autonomously evolved the most stable folds, with stability and function evolving subsequently, suggesting the existence of common protein ancestors from which all modern proteins evolved. To test this hypothesis, we shuffled the sequences of 10 natural proteins and obtained 40 different and apparently unrelated folds. Our results suggest that shuffled sequences are sufficiently stable and may act as a basis to evolve functional proteins. The common secondary structure of modern proteins is well represented by a small set of permuted sequences, which also show the emergence of intrinsic disorder and aggregation-prone stretches of the polypeptide chain.
Collapse
|
379
|
Walsh I, Giollo M, Di Domenico T, Ferrari C, Zimmermann O, Tosatto SCE. Comprehensive large-scale assessment of intrinsic protein disorder. ACTA ACUST UNITED AC 2014; 31:201-8. [PMID: 25246432 DOI: 10.1093/bioinformatics/btu625] [Citation(s) in RCA: 128] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. RESULTS MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. AVAILABILITY The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Manuel Giollo
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Tomás Di Domenico
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Carlo Ferrari
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Olav Zimmermann
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| |
Collapse
|
380
|
Improved prediction of residue flexibility by embedding optimized amino acid grouping into RSA-based linear models. Amino Acids 2014; 46:2665-80. [DOI: 10.1007/s00726-014-1817-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 07/21/2014] [Indexed: 11/26/2022]
|
381
|
Cilia E, Pancsa R, Tompa P, Lenaerts T, Vranken WF. From protein sequence to dynamics and disorder with DynaMine. Nat Commun 2014; 4:2741. [PMID: 24225580 DOI: 10.1038/ncomms3741] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 10/10/2013] [Indexed: 11/09/2022] Open
Abstract
Protein function and dynamics are closely related; however, accurate dynamics information is difficult to obtain. Here based on a carefully assembled data set derived from experimental data for proteins in solution, we quantify backbone dynamics properties on the amino-acid level and develop DynaMine--a fast, high-quality predictor of protein backbone dynamics. DynaMine uses only protein sequence information as input and shows great potential in distinguishing regions of different structural organization, such as folded domains, disordered linkers, molten globules and pre-structured binding motifs of different sizes. It also identifies disordered regions within proteins with an accuracy comparable to the most sophisticated existing predictors, without depending on prior disorder knowledge or three-dimensional structural information. DynaMine provides molecular biologists with an important new method that grasps the dynamical characteristics of any protein of interest, as we show here for human p53 and E1A from human adenovirus 5.
Collapse
Affiliation(s)
- Elisa Cilia
- 1] MLG, Département d'Informatique, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, 1050 Brussels, Belgium [2] Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium
| | | | | | | | | |
Collapse
|
382
|
Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. CELLULAR AND MOLECULAR LIFE SCIENCES : CMLS 2014. [PMID: 24939692 DOI: 10.1007/s00018‐014‐1661‐9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Recent years witnessed increased interest in intrinsically disordered proteins and regions. These proteins and regions are abundant and possess unique structural features and a broad functional repertoire that complements ordered proteins. However, modern studies on the abundance and functions of intrinsically disordered proteins and regions are relatively limited in size and scope of their analysis. To fill this gap, we performed a broad and detailed computational analysis of over 6 million proteins from 59 archaea, 471 bacterial, 110 eukaryotic and 325 viral proteomes. We used arguably more accurate consensus-based disorder predictions, and for the first time comprehensively characterized intrinsic disorder at proteomic and protein levels from all significant perspectives, including abundance, cellular localization, functional roles, evolution, and impact on structural coverage. We show that intrinsic disorder is more abundant and has a unique profile in eukaryotes. We map disorder into archaea, bacterial and eukaryotic cells, and demonstrate that it is preferentially located in some cellular compartments. Functional analysis that considers over 1,200 annotations shows that certain functions are exclusively implemented by intrinsically disordered proteins and regions, and that some of them are specific to certain domains of life. We reveal that disordered regions are often targets for various post-translational modifications, but primarily in the eukaryotes and viruses. Using a phylogenetic tree for 14 eukaryotic and 112 bacterial species, we analyzed relations between disorder, sequence conservation and evolutionary speed. We provide a complete analysis that clearly shows that intrinsic disorder is exceptionally and uniquely abundant in each domain of life.
Collapse
|
383
|
Walsh I, Seno F, Tosatto SCE, Trovato A. PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Res 2014; 42:W301-7. [PMID: 24848016 PMCID: PMC4086119 DOI: 10.1093/nar/gku399] [Citation(s) in RCA: 346] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
The formation of amyloid aggregates upon protein misfolding is related to several devastating degenerative diseases. The propensities of different protein sequences to aggregate into amyloids, how they are enhanced by pathogenic mutations, the presence of aggregation hot spots stabilizing pathological interactions, the establishing of cross-amyloid interactions between co-aggregating proteins, all rely at the molecular level on the stability of the amyloid cross-beta structure. Our redesigned server, PASTA 2.0, provides a versatile platform where all of these different features can be easily predicted on a genomic scale given input sequences. The server provides other pieces of information, such as intrinsic disorder and secondary structure predictions, that complement the aggregation data. The PASTA 2.0 energy function evaluates the stability of putative cross-beta pairings between different sequence stretches. It was re-derived on a larger dataset of globular protein domains. The resulting algorithm was benchmarked on comprehensive peptide and protein test sets, leading to improved, state-of-the-art results with more amyloid forming regions correctly detected at high specificity. The PASTA 2.0 server can be accessed at http://protein.bio.unipd.it/pasta2/.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biomedical Sciences, University of Padova, Padova I-35131, Italy
| | - Flavio Seno
- INFN, Padova Section, and Department of Physics and Astronomy 'G. Galilei', University of Padova, Padova I-35121, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Padova I-35131, Italy
| | - Antonio Trovato
- INFN, Padova Section, and Department of Physics and Astronomy 'G. Galilei', University of Padova, Padova I-35121, Italy
| |
Collapse
|
384
|
Giollo M, Martin AJM, Walsh I, Ferrari C, Tosatto SCE. NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation. BMC Genomics 2014; 15 Suppl 4:S7. [PMID: 25057121 PMCID: PMC4083412 DOI: 10.1186/1471-2164-15-s4-s7] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid growth of un-annotated missense variants poses challenges requiring novel strategies for their interpretation. From the thermodynamic point of view, amino acid changes can lead to a change in the internal energy of a protein and induce structural rearrangements. This is of great relevance for the study of diseases and protein design, justifying the development of prediction methods for variant-induced stability changes. RESULTS Here we propose NeEMO, a tool for the evaluation of stability changes using an effective representation of proteins based on residue interaction networks (RINs). RINs are used to extract useful features describing interactions of the mutant amino acid with its structural environment. Benchmarking shows NeEMO to be very effective, allowing reliable predictions in different parts of the protein such as β-strands and buried residues. Validation on a previously published independent dataset shows that NeEMO has a Pearson correlation coefficient of 0.77 and a standard error of 1 Kcal/mol, outperforming nine recent methods. The NeEMO web server can be freely accessed from URL: http://protein.bio.unipd.it/neemo/. CONCLUSIONS NeEMO offers an innovative and reliable tool for the annotation of amino acid changes. A key contribution are RINs, which can be used for modeling proteins and their interactions effectively. Interestingly, the approach is very general, and can motivate the development of a new family of RIN-based protein structure analyzers. NeEMO may suggest innovative strategies for bioinformatics tools beyond protein stability prediction.
Collapse
|
385
|
Cilia E, Pancsa R, Tompa P, Lenaerts T, Vranken WF. The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res 2014; 42:W264-70. [PMID: 24728994 PMCID: PMC4086073 DOI: 10.1093/nar/gku270] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be.
Collapse
Affiliation(s)
- Elisa Cilia
- MLG, Computer Science Department, Université Libre de Bruxelles (ULB), Brussels, Belgium Interuniversity Institute of Bioinformatics in Brussels (IB), ULB-VUB, Brussels, Belgium
| | - Rita Pancsa
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, Belgium Department of Structural Biology, VIB, Brussels, Belgium
| | - Peter Tompa
- Interuniversity Institute of Bioinformatics in Brussels (IB), ULB-VUB, Brussels, Belgium Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, Belgium Department of Structural Biology, VIB, Brussels, Belgium Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Tom Lenaerts
- MLG, Computer Science Department, Université Libre de Bruxelles (ULB), Brussels, Belgium Interuniversity Institute of Bioinformatics in Brussels (IB), ULB-VUB, Brussels, Belgium AI-Lab, Computer Science Department, Vrije Universiteit Brussel, Brussels, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels (IB), ULB-VUB, Brussels, Belgium Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, Belgium Department of Structural Biology, VIB, Brussels, Belgium
| |
Collapse
|
386
|
Walsh I, Di Domenico T, Tosatto SCE. RUBI: rapid proteomic-scale prediction of lysine ubiquitination and factors influencing predictor performance. Amino Acids 2013; 46:853-62. [PMID: 24363213 DOI: 10.1007/s00726-013-1645-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2013] [Accepted: 12/11/2013] [Indexed: 11/25/2022]
Abstract
Post-translational modification of protein lysines was recently shown to be a common feature of eukaryotic organisms. The ubiquitin modification is regarded as a versatile regulatory mechanism with many important cellular roles. Large-scale datasets are becoming available for H. sapiens ubiquitination. However, using current experimental techniques the vast majority of their sites remain unidentified and in silico tools may offer an alternative. Here, we introduce Rapid UBIquitination (RUBI) a sequence-based ubiquitination predictor designed for rapid application on a genome scale. RUBI was constructed using an iterative approach. At each iteration, important factors which influenced performance and its usability were investigated. The final RUBI model has an AUC of 0.868 on a large cross-validation set and is shown to outperform other available methods on independent sets. Predicted intrinsic disorder is shown to be weakly anti-correlated to ubiquitination for the H. sapiens dataset and improves performance slightly. RUBI predicts the number of ubiquitination sites correctly within three sites for ca. 80% of the tested proteins. The average potentially ubiquitinated proteome fraction is predicted to be at least 25% across a variety of model organisms, including several thousand possible H. sapiens proteins awaiting experimental characterization. RUBI can accurately predict ubiquitination on unseen examples and has a signal across different eukaryotic organisms. The factors which influenced the construction of RUBI could also be tested in other post-translational modification predictors. One of the more interesting factors is the influence of intrinsic protein disorder on ubiquitinated lysines where residues with low disorder probability are preferred.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biology, University of Padua, Viale G. Colombo 3, 35131, Padua, Italy,
| | | | | |
Collapse
|
387
|
Mao W, Cong P, Wang Z, Lu L, Zhu Z, Li T. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data. PLoS One 2013; 8:e83532. [PMID: 24376713 PMCID: PMC3871590 DOI: 10.1371/journal.pone.0083532] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2013] [Accepted: 11/04/2013] [Indexed: 11/28/2022] Open
Abstract
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Collapse
Affiliation(s)
- Wusong Mao
- Department of Chemistry, Tongji University, Shanghai, China
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| | - Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Longjian Lu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Zhongliang Zhu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| |
Collapse
|
388
|
Becker J, Maes F, Wehenkel L. On the encoding of proteins for disordered regions prediction. PLoS One 2013; 8:e82252. [PMID: 24358161 PMCID: PMC3864923 DOI: 10.1371/journal.pone.0082252] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 10/21/2013] [Indexed: 12/02/2022] Open
Abstract
Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder.
Collapse
Affiliation(s)
- Julien Becker
- Bioinformatics and Modeling, GIGA-Research, University of Liege, Liege, Belgium
| | - Francis Maes
- Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium
- Declaratieve Talen en Artificiele Intelligentie, Departement Computerwetenschappen, University of Leuven, Leuven, Belgium
| | - Louis Wehenkel
- Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium
- * E-mail:
| |
Collapse
|
389
|
Mahani A, Henriksson J, Wright APH. Origins of Myc proteins--using intrinsic protein disorder to trace distant relatives. PLoS One 2013; 8:e75057. [PMID: 24086436 PMCID: PMC3782479 DOI: 10.1371/journal.pone.0075057] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Accepted: 08/09/2013] [Indexed: 01/10/2023] Open
Abstract
Mammalian Myc proteins are important determinants of cell proliferation as well as the undifferentiated state of stem cells and their activity is frequently deregulated in cancer. Based mainly on conservation in the C-terminal DNA-binding and dimerization domain, Myc-like proteins have been reported in many simpler organisms within and outside the Metazoa but they have not been found in fungi or plants. Several important signature motifs defining mammalian Myc proteins are found in the N-terminal domain but the extent to which these are found in the Myc-like proteins from simpler organisms is not well established. The extent of N-terminal signature sequence conservation would give important insights about the evolution of Myc proteins and their current function in mammalian physiology and disease. In a systematic study of Myc-like proteins we show that N-terminal signature motifs are not readily detectable in individual Myc-like proteins from invertebrates but that weak similarities to Myc boxes 1 and 2 can be found in the N-termini of the simplest Metazoa as well as the unicellular choanoflagellate, Monosiga brevicollis, using multiple protein alignments. Phylogenetic support for the connections of these proteins to established Myc proteins is however poor. We show that the pattern of predicted protein disorder along the length of Myc proteins can be used as a complementary approach to making dendrograms of Myc proteins that aids the classification of Myc proteins. This suggests that the pattern of disorder within Myc proteins is more conserved through evolution than their amino acid sequence. In the disorder-based dendrograms the Myc-like proteins from simpler organisms, including M. brevicollis, are connected to established Myc proteins with a higher degree of certainty. Our results suggest that protein disorder based dendrograms may be of general significance for studying distant relationships between proteins, such as transcription factors, that have high levels of intrinsic disorder.
Collapse
Affiliation(s)
- Amir Mahani
- Department of Laboratory Medicine and Center for Biosciences, Karolinska Institute, Huddinge, Sweden
| | - Johan Henriksson
- Department of Laboratory Medicine and Center for Biosciences, Karolinska Institute, Huddinge, Sweden
| | - Anthony P. H. Wright
- Department of Laboratory Medicine and Center for Biosciences, Karolinska Institute, Huddinge, Sweden
- * E-mail:
| |
Collapse
|
390
|
Peng Z, Mizianty MJ, Kurgan L. Genome-scale prediction of proteins with long intrinsically disordered regions. Proteins 2013; 82:145-58. [PMID: 23798504 DOI: 10.1002/prot.24348] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 06/06/2013] [Indexed: 12/24/2022]
Abstract
Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/.
Collapse
Affiliation(s)
- Zhenling Peng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | | | | |
Collapse
|
391
|
Jin F, Liu Z. Inherent relationships among different biophysical prediction methods for intrinsically disordered proteins. Biophys J 2013; 104:488-95. [PMID: 23442871 DOI: 10.1016/j.bpj.2012.12.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 12/08/2012] [Accepted: 12/10/2012] [Indexed: 11/17/2022] Open
Abstract
Intrinsically disordered proteins do not have stable secondary and/or tertiary structures but still function. More than 50 prediction methods have been developed and inherent relationships may be expected to exist among them. To investigate this, we conducted molecular simulations and algorithmic analyses on a minimal coarse-grained polypeptide model and discovered a common basis for the charge-hydropathy plot and packing-density algorithms that was verified by correlation analysis. The correlation analysis approach was applied to realistic datasets, which revealed correlations among some physical-chemical properties (charge-hydropathy plot, packing density, pairwise energy). The correlations indicated that these biophysical methods find a projected direction to discriminate ordered and disordered proteins. The optimized projection was determined and the ultimate accuracy limit of the existing algorithms is discussed.
Collapse
Affiliation(s)
- Fan Jin
- College of Chemistry and Molecular Engineering, Center for Quantitative Biology, and Beijing National Laboratory for Molecular Sciences, Peking University, Beijing, China
| | | |
Collapse
|
392
|
Wei Q, Dunbrack RL. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS One 2013; 8:e67863. [PMID: 23874456 PMCID: PMC3706434 DOI: 10.1371/journal.pone.0067863] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2012] [Accepted: 05/23/2013] [Indexed: 12/03/2022] Open
Abstract
Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.
Collapse
Affiliation(s)
- Qiong Wei
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America
| | - Roland L. Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
393
|
Yan J, Mizianty MJ, Filipow PL, Uversky VN, Kurgan L. RAPID: fast and accurate sequence-based prediction of intrinsic disorder content on proteomic scale. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:1671-80. [PMID: 23732563 DOI: 10.1016/j.bbapap.2013.05.022] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 05/22/2013] [Indexed: 11/24/2022]
Abstract
Recent research in the protein intrinsic disorder was stimulated by the availability of accurate computational predictors. However, most of these methods are relatively slow, especially considering proteome-scale applications, and were shown to produce relatively large errors when estimating disorder at the protein- (in contrast to residue-) level, which is defined by the fraction/content of disordered residues. To this end, we propose a novel support vector Regression-based Accurate Predictor of Intrinsic Disorder (RAPID). Key advantages of RAPID are speed (prediction of an average-size eukaryotic proteome takes <1h on a modern desktop computer); sophisticated design (multiple, complementary information sources that are aggregated over an input chain are combined using feature selection); and high-quality and robust predictive performance. Empirical tests on two diverse benchmark datasets reveal that RAPID's predictive performance compares favorably to a comprehensive set of state-of-the-art disorder and disorder content predictors. Drawing on high speed and good predictive quality, RAPID was used to perform large-scale characterization of disorder in 200+ fully sequenced eukaryotic proteomes. Our analysis reveals interesting relations of disorder with structural coverage and chain length, and unusual distribution of fully disordered chains. We also performed a comprehensive (using 56000+ annotated chains, which doubles the scope of previous studies) investigation of cellular functions and localizations that are enriched in the disorder in the human proteome. RAPID, which allows for batch (proteome-wide) predictions, is available as a web server at http://biomine.ece.ualberta.ca/RAPID/.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | | | | | | | | |
Collapse
|
394
|
Di Domenico T, Walsh I, Tosatto SCE. Analysis and consensus of currently available intrinsic protein disorder annotation sources in the MobiDB database. BMC Bioinformatics 2013; 14 Suppl 7:S3. [PMID: 23815411 PMCID: PMC3633070 DOI: 10.1186/1471-2105-14-s7-s3] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Intrinsic protein disorder is becoming an increasingly important topic in protein science. During the last few years, intrinsically disordered proteins (IDPs) have been shown to play a role in many important biological processes, e.g. protein signalling and regulation. This has sparked a need to better understand and characterize different types of IDPs, their functions and roles. Our recently published database, MobiDB, provides a centralized resource for accessing and analysing intrinsic protein disorder annotations. RESULTS Here, we present a thorough description and analysis of the data made available by MobiDB, providing descriptive statistics on the various available annotation sources. Version 1.2.1 of the database contains annotations for ca. 4,500,000 UniProt sequences, covering all eukaryotic proteomes. In addition, we describe a novel consensus annotation calculation and its related weighting scheme. The comparison between disorder information sources highlights how the MobiDB consensus captures the main features of intrinsic disorder and correlates well with manually curated datasets. Finally, we demonstrate the annotation of 13 eukaryotic model organisms through MobiDB's datasets, and of an example protein through the interactive user interface. CONCLUSIONS MobiDB is a central resource for intrinsic disorder research, containing both experimental data and predictions. In the future it will be expanded to include additional information for all known proteins.
Collapse
Affiliation(s)
- Tomás Di Domenico
- Department of Biology, University of Padova, Viale G. Colombo 3, 35131 Padova, Italy
| | | | | |
Collapse
|
395
|
DEIANA ANTONIO, GIANSANTI ANDREA. TUNING THE PRECISION OF PREDICTORS TO REDUCE OVERESTIMATION OF PROTEIN DISORDER OVER LARGE DATASETS. J Bioinform Comput Biol 2013; 11:1250023. [DOI: 10.1142/s0219720012500230] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
This is a study on the precision of four known protein disorder predictors, ranked among the best-performing ones: DISOPRED2, PONDR VSL2B, IUPred and ESpritz. We address here the problem of a systematic overestimation of the number of disordered proteins recognized through the use of these predictors, considered as a standard. Some of these predictors, used with their default setting, have a low precision, implying a tendency to overestimate the occurrence of disordered proteins in genome-wide surveys. Moreover, different predictors often disagree on the evaluation of individual proteins. To cope with this problem and in order to propose a simple procedure that enhances precision based on precision-recall curves, we re-tuned the discriminative thresholds of the predictors by training and cross-validating their performance on a cured dataset. After re-tuning, both the disagreement among predictors and the tendency to overestimate the occurrence of disordered proteins are reduced. This is shown in a dedicated study over the human proteome and a set of cancer-related human proteins, with no a priori disorder annotation. Simple quantitative estimates suggest that the occurrence of disorder among cancer-related proteins and other similar large-scale surveys has been overestimated in the past.
Collapse
Affiliation(s)
- ANTONIO DEIANA
- Physics Department, Sapienza University of Rome, Rome, Italy
| | - ANDREA GIANSANTI
- Physics Department, Sapienza University of Rome, Rome, Italy
- INFN, Sezione di Roma1, Roma, 00185, Italy
| |
Collapse
|
396
|
Xue B, Uversky VN. Structural characterizations of phosphorylatable residues in transmembrane proteins from Arabidopsis thaliana. INTRINSICALLY DISORDERED PROTEINS 2013; 1:e25713. [PMID: 28516016 PMCID: PMC5424800 DOI: 10.4161/idp.25713] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 06/28/2013] [Accepted: 07/10/2013] [Indexed: 12/26/2022]
Abstract
Phosphorylation is a common post-translational modification that plays important roles in a wide range of biochemical and cellular processes. Many enzymes and receptors can be switched “on” or “off” by conformational changes induced by phosphorylation. The phosphorylation process is mediated by a family of enzymes called kinase. Currently, more than 1,000 different kinases have been identified in Arabidopsis thaliana proteome. Kinases interact with each other and with many regulatory proteins forming phosphorylation networks. These phosphorylation networks modulate the signaling processes and control the functions of cells. Normally, kinases phosphorylate serines, threonines, and tyrosines. However, in many proteins, not all of these 3 types of amino acids can be phosphorylated. Therefore, identifying the phosphorylation sites and the possible phosphorylation events is very important in decoding the processes of regulation and the function of phosphorylation networks. In this study, we applied computational and bioinformatics tools to characterize the association between phosphorylation events and structural properties of corresponding proteins by analyzing more than 50 trans-membrane proteins from Arabidopsis thaliana. In addition to the previously established conclusion that phosphorylation sites are closely associated with intrinsic disorder, we found that the phosphorylation process may also be affected by solvent accessibility of phosphorylation sites and further promoted by neighboring modification events.
Collapse
Affiliation(s)
- Bin Xue
- Department of Molecular Medicine; Morsani College of Medicine; University of South Florida; Tampa, FL USA
| | - Vladimir N Uversky
- Department of Molecular Medicine; Morsani College of Medicine; University of South Florida; Tampa, FL USA.,USF Health Byrd Alzheimer's Research Institute; Morsani College of Medicine; University of South Florida; Tampa, FL USA.,Institute for Biological Instrumentation; Russian Academy of Sciences; Moscow Region, Russia
| |
Collapse
|
397
|
Mizianty MJ, Peng Z, Kurgan L. MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. INTRINSICALLY DISORDERED PROTEINS 2013; 1:e24428. [PMID: 28516009 PMCID: PMC5424793 DOI: 10.4161/idp.24428] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 03/23/2013] [Indexed: 11/28/2022]
Abstract
Intrinsically disordered proteins (IDPs) are either entirely disordered or contain disordered regions in their native state. IDPs were found to be abundant in complex organisms and implicated in numerous cellular processes. Experimental annotation of disorder lags behind the rapidly growing sizes of the protein databases, and thus computational methods are used to close this gap and to investigate the disorder. MFDp2 is a novel content-rich and user-friendly web server for sequence-based prediction of protein disorder that builds upon our residue-level disorder predictor MFDp and chain-level disorder content predictor DisCon. It applies novel post-processing filters and uses sequence alignment to improve predictive quality. Using a new benchmark data set, which has reduced sequence identity to corresponding training data sets, MFDp2 is shown to provide competitive predictive quality when compared with MFDp and a comprehensive set of 13 other state-of-the-art predictors, including publicly available versions of the top predictors from CASP9. Our server obtains the highest Mathews Correlation Coefficient (MCC) and the second best Area Under the receiver operating characteristic Curve (AUC). In addition to the disorder predictions, our server also outputs well-described sequence-derived information that allows profiling the predicted disorder. We conveniently visualize sequence conservation, predicted secondary structure, relative solvent accessibility and alignments to chains with annotated disorder. We allow predictions for multiple proteins at the same time and each prediction can be downloaded as text-based (parsable) file. The web server, which includes help pages and tutorial, is freely available at biomine.ece.ualberta.ca/MFDp2/.
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton, AB Canada
| | - Zhenling Peng
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton, AB Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton, AB Canada
| |
Collapse
|
398
|
Fan X, Kurgan L. Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J Biomol Struct Dyn 2013; 32:448-64. [DOI: 10.1080/07391102.2013.775969] [Citation(s) in RCA: 136] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
399
|
Eickholt J, Cheng J. DNdisorder: predicting protein disorder using boosting and deep networks. BMC Bioinformatics 2013; 14:88. [PMID: 23497251 PMCID: PMC3599628 DOI: 10.1186/1471-2105-14-88] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 02/28/2013] [Indexed: 11/23/2022] Open
Abstract
Background A number of proteins contain regions which do not adopt a stable tertiary structure in their native state. Such regions known as disordered regions have been shown to participate in many vital cell functions and are increasingly being examined as drug targets. Results This work presents a new sequence based approach for the prediction of protein disorder. The method uses boosted ensembles of deep networks to make predictions and participated in the CASP10 experiment. In a 10 fold cross validation procedure on a dataset of 723 proteins, the method achieved an average balanced accuracy of 0.82 and an area under the ROC curve of 0.90. These results are achieved in part by a boosting procedure which is able to steadily increase balanced accuracy and the area under the ROC curve over several rounds. The method also compared competitively when evaluated against a number of state-of-the-art disorder predictors on CASP9 and CASP10 benchmark datasets. Conclusions DNdisorder is available as a web service at http://iris.rnet.missouri.edu/dndisorder/.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | |
Collapse
|
400
|
HOWELL MARK, GREEN RYAN, KILLEEN ALEXIS, WEDDERBURN LAMAR, PICASCIO VINCENT, RABIONET ALEJANDRO, PENG ZHENLING, LARINA MAYA, XUE BIN, KURGAN LUKASZ, UVERSKY VLADIMIRN. NOT THAT RIGID MIDGETS AND NOT SO FLEXIBLE GIANTS: ON THE ABUNDANCE AND ROLES OF INTRINSIC DISORDER IN SHORT AND LONG PROTEINS. J BIOL SYST 2013. [DOI: 10.1142/s0218339012400086] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Intrinsically disordered proteins or proteins with disordered regions are very common in nature. These proteins have numerous biological functions which are complementary to the biological activities of traditional ordered proteins. A noticeable difference in the amino acid sequences encoding long and short disordered regions was found and this difference was used in the development of length-dependent predictors of intrinsic disorder. In this study, we analyze the scaling of intrinsic disorder in eukaryotic proteins and investigate the presence of length-dependent functions attributed to proteins containing long disordered regions.
Collapse
Affiliation(s)
- MARK HOWELL
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - RYAN GREEN
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - ALEXIS KILLEEN
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - LAMAR WEDDERBURN
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - VINCENT PICASCIO
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - ALEJANDRO RABIONET
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - ZHENLING PENG
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - MAYA LARINA
- Department of Mathematics and Informatics, College of Medical Biochemistry, Volgograd State Medical University, 400131 Volgograd, Russia
| | - BIN XUE
- Department of Molecular Medicine, College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - LUKASZ KURGAN
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - VLADIMIR N. UVERSKY
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, College of Medicine, University of South Florida, Tampa, FL 33612, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
| |
Collapse
|