201
|
Oberti M, Vaisman II. cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks. Proteins 2020; 88:1472-1481. [PMID: 32535960 DOI: 10.1002/prot.25966] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 11/18/2019] [Accepted: 06/06/2020] [Indexed: 12/23/2022]
Abstract
Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method-which tries to overcome the challenge of accurate prediction posed by IDRs-based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.
Collapse
Affiliation(s)
- Mauricio Oberti
- School of Systems Biology, George Mason University, Manassas, Virginia, USA.,Novartis Institutes for BioMedical Research, Cambridge, Massachussets, USA
| | - Iosif I Vaisman
- School of Systems Biology, George Mason University, Manassas, Virginia, USA
| |
Collapse
|
202
|
Wang L, Zhang R. Towards Computational Models of Identifying Protein Ubiquitination Sites. Curr Drug Targets 2020; 20:565-578. [PMID: 30246637 DOI: 10.2174/1389450119666180924150202] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 08/29/2018] [Accepted: 09/04/2018] [Indexed: 12/25/2022]
Abstract
Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.
Collapse
Affiliation(s)
- Lidong Wang
- College of Science, Dalian Maritime University, Dalian, China
| | - Ruijun Zhang
- College of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
203
|
Genomic Analysis of Intrinsically Disordered Proteins in the Genus Camelus. Int J Mol Sci 2020; 21:ijms21114010. [PMID: 32503351 PMCID: PMC7312968 DOI: 10.3390/ijms21114010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/14/2020] [Accepted: 05/18/2020] [Indexed: 12/11/2022] Open
Abstract
Intrinsically disordered proteins/regions (IDPs/IDRs) fail to fold completely into 3D structures, but have major roles in determining protein function. While natively disordered proteins/regions have been found to fulfill a wide variety of primary cellular roles, the functions of many disordered proteins in numerous species remain to be uncovered. Here, we perform the first large-scale study of IDPs/IDRs in the genus Camelus, one of the most important mammalians in Asia and North Africa, in order to explore the biological roles of these proteins. The study includes the prediction of disordered proteins/regions in Camelus species and in humans using multiple state-of-the-art prediction tools. Additionally, we provide a comparative analysis of Camelus and Homo sapiens IDPs/IDRs for the sake of highlighting the distinctive use of disorder in each genus. Our findings indicate that the human proteome is more disordered than the Camelus proteome. Gene Ontology analysis also revealed that Camelus IDPs are enriched in glutathione catabolism and lactose biosynthesis.
Collapse
|
204
|
Yan J, Cheng J, Kurgan L, Uversky VN. Structural and functional analysis of "non-smelly" proteins. Cell Mol Life Sci 2020; 77:2423-2440. [PMID: 31486849 PMCID: PMC11105052 DOI: 10.1007/s00018-019-03292-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 08/21/2019] [Accepted: 08/28/2019] [Indexed: 01/09/2023]
Abstract
Cysteine and aromatic residues are major structure-promoting residues. We assessed the abundance, structural coverage, and functional characteristics of the "non-smelly" proteins, i.e., proteins that do not contain cysteine residues (C-depleted) or cysteine and aromatic residues (CFYWH-depleted), across 817 proteomes from all domains of life. The analysis revealed that although these proteomes contained significant levels of the C-depleted proteins, with prokaryotes being significantly more enriched in such proteins than eukaryotes, the CFYWH-depleted proteins were relatively rare, accounting for about 0.05% of proteomes. Furthermore, CFYWH-depleted proteins were virtually never found in PDB. Depletion in cysteine and in aromatic residues was associated with the substantially increased intrinsic disorder levels across all domains of life. Archaeal and eukaryotic organisms with higher levels of the C-depleted proteins were shown to have higher levels of the intrinsic disorder and lower levels of structural coverage. We also showed that the "non-smelly" proteins typically did not independently fold into monomeric structures, and instead, they fold by interacting with nucleic acids as constituents of the ribosome and nucleosome complexes. They were shown to be involved in translation, transcription, nucleosome assembly, transmembrane transport, and protein folding functions, all of which are known to be associated with the intrinsic disorder. Our data suggested that, in general, structure of monomeric proteins is crucially dependent on the presence of cysteine and aromatic residues.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL, 33612, USA.
- Protein Research Group, Institute for Biological Instrumentation of the Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
| |
Collapse
|
205
|
You K, Huang Q, Yu C, Shen B, Sevilla C, Shi M, Hermjakob H, Chen Y, Li T. PhaSepDB: a database of liquid-liquid phase separation related proteins. Nucleic Acids Res 2020; 48:D354-D359. [PMID: 31584089 PMCID: PMC6943039 DOI: 10.1093/nar/gkz847] [Citation(s) in RCA: 139] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 09/12/2019] [Accepted: 10/01/2019] [Indexed: 12/31/2022] Open
Abstract
It's widely appreciated that liquid–liquid phase separation (LLPS) underlies the formation of membraneless organelles, which function to concentrate proteins and nucleic acids. In the past few decades, major efforts have been devoted to identify the phase separation associated proteins and elucidate their functions. To better utilize the knowledge dispersed in published literature, we developed PhaSepDB (http://db.phasep.pro/), a manually curated database of phase separation associated proteins. Currently, PhaSepDB includes 2914 non-redundant proteins localized in different organelles curated from published literature and database. PhaSepDB provides protein summary, publication reference and sequence features of phase separation associated proteins. The sequence features which reflect the LLPS behavior are also available for other human protein candidates. The online database provides a convenient interface for the research community to easily browse, search and download phase separation associated proteins. As a centralized resource, we believe PhaSepDB will facilitate the future study of phase separation.
Collapse
Affiliation(s)
- Kaiqiang You
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing 100191, China
| | - Qi Huang
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing 100191, China
| | - Chunyu Yu
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing 100191, China
| | - Boyan Shen
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing 100191, China
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Minglei Shi
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Yang Chen
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tingting Li
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing 100191, China
| |
Collapse
|
206
|
Horvath A, Miskei M, Ambrus V, Vendruscolo M, Fuxreiter M. Sequence-based prediction of protein binding mode landscapes. PLoS Comput Biol 2020; 16:e1007864. [PMID: 32453748 PMCID: PMC7304629 DOI: 10.1371/journal.pcbi.1007864] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 06/19/2020] [Accepted: 04/09/2020] [Indexed: 02/04/2023] Open
Abstract
Interactions between disordered proteins involve a wide range of changes in the structure and dynamics of the partners involved. These changes can be classified in terms of binding modes, which include disorder-to-order (DO) transitions, when proteins fold upon binding, as well as disorder-to-disorder (DD) transitions, when the conformational heterogeneity is maintained in the bound states. Furthermore, systematic studies of these interactions are revealing that proteins may exhibit different binding modes with different partners. Proteins that exhibit this context-dependent binding can be referred to as fuzzy proteins. Here we investigate amino acid code for fuzzy binding in terms of the entropy of the probability distribution of transitions towards decreasing order. We implement these entropy calculations into the FuzPred (http://protdyn-fuzpred.org) algorithm to predict the range of context-dependent binding modes of proteins from their amino acid sequences. As we illustrate through a variety of examples, this method identifies those binding sites that are sensitive to the cellular context or post-translational modifications, and may serve as regulatory points of cellular pathways. Great advances have been made in the last several decades in deciphering how the behavior of proteins is encoded in their amino acid sequences. A variety of sequence-based prediction methods have been developed to estimate a wide range of properties of proteins, including secondary structure propensity, native state structures, preference for being disordered and tendency to aggregate. Much less is known, however, about the rules that regulate the conformational changes of proteins upon binding. In particular, many proteins change their binding modes upon interacting with different partners, or as a consequence of post-translational modifications or changes in the cellular milieu. Here we address the problem of how amino acid sequences can encode different binding modes depending on their binding partners, and describe the FuzPred method of predicting context-dependent binding modes.
Collapse
Affiliation(s)
- Attila Horvath
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
- The John Curtin School of Medical Research, The Australian National University, Canberra, Australia
| | - Marton Miskei
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
| | - Viktor Ambrus
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
| | - Michele Vendruscolo
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (MV); (MF)
| | - Monika Fuxreiter
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
- * E-mail: (MV); (MF)
| |
Collapse
|
207
|
Intrinsic Disorder in Tetratricopeptide Repeat Proteins. Int J Mol Sci 2020; 21:ijms21103709. [PMID: 32466138 PMCID: PMC7279152 DOI: 10.3390/ijms21103709] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/12/2020] [Accepted: 05/22/2020] [Indexed: 12/27/2022] Open
Abstract
Among the realm of repeat containing proteins that commonly serve as “scaffolds” promoting protein-protein interactions, there is a family of proteins containing between 2 and 20 tetratricopeptide repeats (TPRs), which are functional motifs consisting of 34 amino acids. The most distinguishing feature of TPR domains is their ability to stack continuously one upon the other, with these stacked repeats being able to affect interaction with binding partners either sequentially or in combination. It is known that many repeat-containing proteins are characterized by high levels of intrinsic disorder, and that many protein tandem repeats can be intrinsically disordered. Furthermore, it seems that TPR-containing proteins share many characteristics with hybrid proteins containing ordered domains and intrinsically disordered protein regions. However, there has not been a systematic analysis of the intrinsic disorder status of TPR proteins. To fill this gap, we analyzed 166 human TPR proteins to determine the degree to which proteins containing TPR motifs are affected by intrinsic disorder. Our analysis revealed that these proteins are characterized by different levels of intrinsic disorder and contain functional disordered regions that are utilized for protein-protein interactions and often serve as targets of various posttranslational modifications.
Collapse
|
208
|
Gadhave K, Kumar P, Kapuganti SK, Uversky VN, Giri R. Unstructured Biology of Proteins from Ubiquitin-Proteasome System: Roles in Cancer and Neurodegenerative Diseases. Biomolecules 2020; 10:E796. [PMID: 32455657 PMCID: PMC7278180 DOI: 10.3390/biom10050796] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 05/17/2020] [Accepted: 05/19/2020] [Indexed: 12/14/2022] Open
Abstract
The 26S proteasome is a large (~2.5 MDa) protein complex consisting of at least 33 different subunits and many other components, which form the ubiquitin proteasomal system (UPS), an ATP-dependent protein degradation system in the cell. UPS serves as an essential component of the cellular protein surveillance machinery, and its dysfunction leads to cancer, neurodegenerative and immunological disorders. Importantly, the functions and regulations of proteins are governed by the combination of ordered regions, intrinsically disordered protein regions (IDPRs) and molecular recognition features (MoRFs). The structure-function relationships of UPS components have not been identified completely; therefore, in this study, we have carried out the functional intrinsic disorder and MoRF analysis for potential neurodegenerative disease and anti-cancer targets of this pathway. Our report represents the presence of significant intrinsic disorder and disorder-based binding regions in several UPS proteins, such as extraproteasomal polyubiquitin receptors (UBQLN1 and UBQLN2), proteasome-associated polyubiquitin receptors (ADRM1 and PSMD4), deubiquitinating enzymes (DUBs) (ATXN3 and USP14), and ubiquitinating enzymes (E2 (UBE2R2) and E3 (STUB1) enzyme). We believe this study will have implications for the conformation-specific roles of different regions of these proteins. This will lead to a better understanding of the molecular basis of UPS-associated diseases.
Collapse
Affiliation(s)
- Kundlik Gadhave
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Himachal Pradesh 175005, India; (K.G.); (P.K.); (S.K.K.)
| | - Prateek Kumar
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Himachal Pradesh 175005, India; (K.G.); (P.K.); (S.K.K.)
| | - Shivani K. Kapuganti
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Himachal Pradesh 175005, India; (K.G.); (P.K.); (S.K.K.)
| | - Vladimir N. Uversky
- Department of Molecular Medicine and Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33620, USA;
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Cientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, 142290 Moscow, Russia
| | - Rajanish Giri
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Himachal Pradesh 175005, India; (K.G.); (P.K.); (S.K.K.)
| |
Collapse
|
209
|
Hu G, Wu Z, Oldfield CJ, Wang C, Kurgan L. Quality assessment for the putative intrinsic disorder in proteins. Bioinformatics 2020; 35:1692-1700. [PMID: 30329008 DOI: 10.1093/bioinformatics/bty881] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/19/2018] [Accepted: 10/15/2018] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION While putative intrinsic disorder is widely used, none of the predictors provides quality assessment (QA) scores. QA scores estimate the likelihood that predictions are correct at a residue level and have been applied in other bioinformatics areas. We recently reported that QA scores derived from putative disorder propensities perform relatively poorly for native disordered residues. Here we design and validate a general approach to construct QA predictors for disorder predictions. RESULTS The QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions) toolbox of methods accommodates a diverse set of ten disorder predictors. It builds upon several innovative design elements including use and scaling of selected physicochemical properties of the input sequence, post-processing of disorder propensity scores, and a feature selection that optimizes the predictive models to a specific disorder predictor. We empirically establish that each one of these elements contributes to the overall predictive performance of our tool and that QUARTER's outputs significantly outperform QA scores derived from the outputs generated the disorder predictors. The best performing QA scores for a single disorder predictor identify 13% of residues that are predicted with 98% precision. QA scores computed by combining results of the ten disorder predictors cover 40% of residues with 95% precision. Case studies are used to show how to interpret the QA scores. QA scores based on the high precision combined predictions are applied to analyze disorder in the human proteome. AVAILABILITY AND IMPLEMENTATION http://biomine.cs.vcu.edu/servers/QUARTER/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People's Republic of China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People's Republic of China
| | | | - Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
210
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
211
|
Folding and structural polymorphism of p53 C-terminal domain: One peptide with many conformations. Arch Biochem Biophys 2020; 684:108342. [DOI: 10.1016/j.abb.2020.108342] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 02/20/2020] [Accepted: 03/11/2020] [Indexed: 11/19/2022]
|
212
|
Badierah RA, Uversky VN, Redwan EM. Dancing with Trojan horses: an interplay between the extracellular vesicles and viruses. J Biomol Struct Dyn 2020; 39:3034-3060. [DOI: 10.1080/07391102.2020.1756409] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Raied A. Badierah
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- Molecular Diagnostic Laboratory, King Abdulaziz University Hospital, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Vladimir N. Uversky
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Federal Research Center ‘Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences’, Pushchino, Moscow Region, Russia
| | - Elrashdy M. Redwan
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
213
|
Aggarwal S, Banerjee SK, Talukdar NC, Yadav AK. Post-translational Modification Crosstalk and Hotspots in Sirtuin Interactors Implicated in Cardiovascular Diseases. Front Genet 2020; 11:356. [PMID: 32425973 PMCID: PMC7204943 DOI: 10.3389/fgene.2020.00356] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 03/24/2020] [Indexed: 01/07/2023] Open
Abstract
Sirtuins are protein deacetylases that play a protective role in cardiovascular diseases (CVDs), as well as many other diseases. Absence of sirtuins can lead to hyperacetylation of both nuclear and mitochondrial proteins leading to metabolic dysregulation. The protein post-translational modifications (PTMs) are known to crosstalk among each other to bring about complex phenotypic outcomes. Various PTM types such as acetylation, ubiquitination, and phosphorylation, and so on, drive transcriptional regulation and metabolism, but such crosstalks are poorly understood. We integrated protein–protein interactions (PPI) and PTMs from several databases to integrate information on 1,251 sirtuin-interacting proteins, of which 544 are associated with cardiac diseases. Based on the ∼100,000 PTM sites obtained for sirtuin interactors, we observed that the frequency of PTM sites (83 per protein), as well as PTM types (five per protein), is higher than the global average for human proteome. We found that ∼60–70% PTM sites fall into ordered regions. Approximately 83% of the sirtuin interactors contained at least one competitive crosstalk (in situ) site, with half of the sites occurring in CVD-associated proteins. A large proportion of identified crosstalk sites were observed for acetylation and ubiquitination competition. We identified 614 proteins containing PTM hotspots (≥5 PTM sites) and 133 proteins containing crosstalk hotspots (≥3 crosstalk sites). We observed that a large proportion of disease-associated sequence variants were found in PTM motifs of CVD proteins. We identified seven proteins (TP53, LMNA, MAPT, ATP2A2, NCL, APEX1, and HIST1H3A) containing disease-associated variants in PTM and crosstalk hotspots. This is the first comprehensive bioinformatics analysis on sirtuin interactors with respect to PTMs and their crosstalks. This study forms a platform for generating interesting hypotheses that can be tested for a deeper mechanistic understanding gained or derived from big-data analytics.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, India.,Division of Life Sciences, Institute of Advanced Study in Science and Technology, Guwahati, India.,Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, India
| | - Sanjay K Banerjee
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, India
| | - Narayan Chandra Talukdar
- Division of Life Sciences, Institute of Advanced Study in Science and Technology, Guwahati, India.,Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, India
| |
Collapse
|
214
|
Kubatova N, Pyper DJ, Jonker HRA, Saxena K, Remmel L, Richter C, Brantl S, Evguenieva‐Hackenberg E, Hess WR, Klug G, Marchfelder A, Soppa J, Streit W, Mayzel M, Orekhov VY, Fuxreiter M, Schmitz RA, Schwalbe H. Rapid Biophysical Characterization and NMR Spectroscopy Structural Analysis of Small Proteins from Bacteria and Archaea. Chembiochem 2020; 21:1178-1187. [PMID: 31705614 PMCID: PMC7217052 DOI: 10.1002/cbic.201900677] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Indexed: 01/08/2023]
Abstract
Proteins encoded by small open reading frames (sORFs) have a widespread occurrence in diverse microorganisms and can be of high functional importance. However, due to annotation biases and their technically challenging direct detection, these small proteins have been overlooked for a long time and were only recently rediscovered. The currently rapidly growing number of such proteins requires efficient methods to investigate their structure-function relationship. Herein, a method is presented for fast determination of the conformational properties of small proteins. Their small size makes them perfectly amenable for solution-state NMR spectroscopy. NMR spectroscopy can provide detailed information about their conformational states (folded, partially folded, and unstructured). In the context of the priority program on small proteins funded by the German research foundation (SPP2002), 27 small proteins from 9 different bacterial and archaeal organisms have been investigated. It is found that most of these small proteins are unstructured or partially folded. Bioinformatics tools predict that some of these unstructured proteins can potentially fold upon complex formation. A protocol for fast NMR spectroscopy structure elucidation is described for the small proteins that adopt a persistently folded structure by implementation of new NMR technologies, including automated resonance assignment and nonuniform sampling in combination with targeted acquisition.
Collapse
Affiliation(s)
- Nina Kubatova
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Dennis J. Pyper
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Hendrik R. A. Jonker
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Krishna Saxena
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Laura Remmel
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Christian Richter
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Sabine Brantl
- AG BakteriengenetikMatthias-Schleiden-InstitutPhilosophenweg 1207743JenaGermany
| | - Elena Evguenieva‐Hackenberg
- Institute for Microbiology and Molecular BiologyJustus Liebig University GiessenHeinrich-Buff-Ring 2635392GiessenGermany
| | - Wolfgang R. Hess
- Faculty of Biology, Genetics and Experimental BioinformaticsAlbert Ludwigs University FreiburgSchänzlestrasse 179104FreiburgGermany
| | - Gabriele Klug
- Institute for Microbiology and Molecular BiologyJustus Liebig University GiessenHeinrich-Buff-Ring 2635392GiessenGermany
| | | | - Jörg Soppa
- Institute for Molecular BiosciencesJohann Wolfgang Goethe UniversityMax-von-Laue-Strasse 960438Frankfurt am MainGermany
| | - Wolfgang Streit
- Department of Microbiology and BiotechnologyUniversity of HamburgOhnhorststrasse 1822609HamburgGermany
| | - Maxim Mayzel
- Swedish NMR CentreUniversity of GothenburgP. O. Box 46540530GothenburgSweden
| | - Vladislav Y. Orekhov
- Swedish NMR CentreUniversity of GothenburgP. O. Box 46540530GothenburgSweden
- Department of Chemistry and Molecular BiologyUniversity of GothenburgKemigården 441296GothenburgSweden
| | - Monika Fuxreiter
- MTA-DE Laboratory of Protein DynamicsDepartment of Biochemistry and Molecular BiologyUniversity of DebrecenNagyerdei krt 984032DebrecenHungary
| | - Ruth A. Schmitz
- Institute for General MicrobiologyChristian Albrechts University KielAm Botanischen Garten 1–924118KielGermany
| | - Harald Schwalbe
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| |
Collapse
|
215
|
Camponeschi I, Damasco A, Uversky VN, Giuliani A, Bianchi MM. Phenotypic suppression caused by resonance with light-dark cycles indicates the presence of a 24-hours oscillator in yeast and suggests a new role of intrinsically disordered protein regions as internal mediators. J Biomol Struct Dyn 2020; 39:2490-2501. [PMID: 32223547 DOI: 10.1080/07391102.2020.1749133] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The mutual interaction between environment and life is a main topic of biological sciences. An interesting aspect of this interaction is the existence of biological rhythms spanning all the levels of organisms from bacteria to humans. On the other hand, the existence of a coupling between external oscillatory stimuli and adaptation and evolution rate of biological systems is a still unexplored issue. Here we give the demonstration of a substantial increase of heritable phenotypic changes in yeast, an organism lacking a photoreception system, when growing at 12 h light/dark cycles, with respect to both stable dark (or light) or non-12 + 12 h cycling. The model system was a yeast strain lacking a gene whose product is at the crossroad of many different physiological regulations, so ruling out any simple explanation in terms of increase in reverse gene mutations. The abundance of intrinsically disordered protein regions (IDPRs) in both deleted gene product and in its vast ensemble of interactors supports the hypothesis that resonance with the environmental cycle might be mediated by intrinsic disorder-driven interactions of protein molecules. This result opens to the speculation of the effect of environment/biological resonance phenomena in evolution and of the role of protein intrinsically disordered regions as internal mediators.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ilaria Camponeschi
- Department of Biology and Biotechnology 'Charles Darwin', Sapienza Università di Roma, Roma, Italy
| | - Achille Damasco
- Department of Physics 'Ettore Pancini', Università di Napoli Federico II, Napoli, Italy
| | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia
| | - Alessandro Giuliani
- Department of Environment and Health, Istituto Superiore di Sanità, Roma, Italy
| | - Michele M Bianchi
- Department of Biology and Biotechnology 'Charles Darwin', Sapienza Università di Roma, Roma, Italy
| |
Collapse
|
216
|
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes (Basel) 2020; 11:genes11040407. [PMID: 32283633 PMCID: PMC7230257 DOI: 10.3390/genes11040407] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/29/2020] [Accepted: 04/01/2020] [Indexed: 12/31/2022] Open
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
Collapse
|
217
|
Boon M, De Zitter E, De Smet J, Wagemans J, Voet M, Pennemann FL, Schalck T, Kuznedelov K, Severinov K, Van Meervelt L, De Maeyer M, Lavigne R. 'Drc', a structurally novel ssDNA-binding transcription regulator of N4-related bacterial viruses. Nucleic Acids Res 2020; 48:445-459. [PMID: 31724707 PMCID: PMC7145618 DOI: 10.1093/nar/gkz1048] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 10/16/2019] [Accepted: 10/23/2019] [Indexed: 12/22/2022] Open
Abstract
Bacterial viruses encode a vast number of ORFan genes that lack similarity to any other known proteins. Here, we present a 2.20 Å crystal structure of N4-related Pseudomonas virus LUZ7 ORFan gp14, and elucidate its function. We demonstrate that gp14, termed here as Drc (ssDNA-binding RNA Polymerase Cofactor), preferentially binds single-stranded DNA, yet contains a structural fold distinct from other ssDNA-binding proteins (SSBs). By comparison with other SSB folds and creation of truncation and amino acid substitution mutants, we provide the first evidence for the binding mechanism of this unique fold. From a biological perspective, Drc interacts with the phage-encoded RNA Polymerase complex (RNAPII), implying a functional role as an SSB required for the transition from early to middle gene transcription during phage infection. Similar to the coliphage N4 gp2 protein, Drc likely binds locally unwound middle promoters and recruits the phage RNA polymerase. However, unlike gp2, Drc does not seem to need an additional cofactor for promoter melting. A comparison among N4-related phage genera highlights the evolutionary diversity of SSB proteins in an otherwise conserved transcription regulation mechanism.
Collapse
Affiliation(s)
- Maarten Boon
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| | - Elke De Zitter
- Department of Chemistry, Biomolecular Architecture, KU Leuven, Leuven 3001, Belgium
| | - Jeroen De Smet
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| | - Jeroen Wagemans
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| | - Marleen Voet
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| | - Friederike L Pennemann
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| | - Thomas Schalck
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| | | | | | - Luc Van Meervelt
- Department of Chemistry, Biomolecular Architecture, KU Leuven, Leuven 3001, Belgium
| | - Marc De Maeyer
- Department of Chemistry, Laboratory of Biomolecular Modelling and Design, KU Leuven, Leuven 3001, Belgium
| | - Rob Lavigne
- Department of Biosystems, Laboratory of Gene Technology, KU Leuven, Leuven 3001, Belgium
| |
Collapse
|
218
|
Hanson J, Paliwal KK, Litfin T, Zhou Y. SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:645-656. [PMID: 32173600 PMCID: PMC7212484 DOI: 10.1016/j.gpb.2019.01.004] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/18/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023]
Abstract
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Kuldip K Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Thomas Litfin
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia; Institute for Glycomics, Griffith University, Gold Coast 4222, Australia.
| |
Collapse
|
219
|
Sequence-Based Prediction of Fuzzy Protein Interactions. J Mol Biol 2020; 432:2289-2303. [DOI: 10.1016/j.jmb.2020.02.017] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Revised: 01/24/2020] [Accepted: 02/14/2020] [Indexed: 12/31/2022]
|
220
|
Reading Targeted DNA Damage in the Active Demethylation Pathway: Role of Accessory Domains of Eukaryotic AP Endonucleases and Thymine-DNA Glycosylases. J Mol Biol 2020:S0022-2836(19)30720-X. [DOI: 10.1016/j.jmb.2019.12.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 11/24/2019] [Accepted: 12/05/2019] [Indexed: 01/07/2023]
|
221
|
Permyakov SE, Yundina EN, Kazakov AS, Permyakova ME, Uversky VN, Permyakov EA. Mouse S100G protein exhibits properties characteristic of a calcium sensor. Cell Calcium 2020; 87:102185. [PMID: 32114281 DOI: 10.1016/j.ceca.2020.102185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 02/10/2020] [Accepted: 02/21/2020] [Indexed: 01/09/2023]
Abstract
Bovine S100 G (calbindin D9k, small Ca2+-binding protein of the EF-hand superfamily) is considered as a calcium buffer protein; i.e., the binding of Ca2+ practically does not change its general conformation. A set of experimental approaches has been used to study structural properties of apo- and Ca2+-loaded forms of mouse S100 G (81.4% identity in amino acid sequence with bovine S100 G). This analysis revealed that, in contrast to bovine S100 G, the removal of calcium ions increases α-helices content of mouse S100 G protein and enhances its accessibility to digestion by α-chymotrypsin. Furthermore, mouse apo-S100 G is characterized by a decreased surface hydrophobicity and reduced tendency for oligomerization. Such behavior is typical of calcium sensor proteins. Apo-state of mouse S100 G still has rather compact structure, which can be cooperatively unfolded by temperature and GdnHCl. Computational analysis of amino acid sequences of S100 G proteins shows that these proteins could be in a disordered state upon a removal of the bound calcium ions. The experimental data show that, although mouse apo-S100 G is flexible compared to the Ca2+-loaded state, the apo-form is not completely disordered and preserves some cooperatively meting structure. The origin of the unexpectedly high stability of mouse S100 G can be rationalized by an exceptionally strong association of its N- and C-terminal parts containing the EF-hands I and II, respectively.
Collapse
Affiliation(s)
- Sergei E Permyakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia.
| | - Elena N Yundina
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| | - Alexei S Kazakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| | - Maria E Permyakova
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| | - Vladimir N Uversky
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia; Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.
| | - Eugene A Permyakov
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| |
Collapse
|
222
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 132] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
223
|
Mandaci SY, Caliskan M, Sariaslan MF, Uversky VN, Coskuner‐Weber O. Epitope region identification challenges of intrinsically disordered proteins in neurodegenerative diseases: Secondary structure dependence of α‐synuclein on simulation techniques and force field parameters. Chem Biol Drug Des 2020; 96:659-667. [DOI: 10.1111/cbdd.13662] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 12/14/2019] [Accepted: 12/24/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Sunay Y. Mandaci
- Molecular Biotechnology Turkish‐German University Istanbul Turkey
| | - Murat Caliskan
- Molecular Biotechnology Turkish‐German University Istanbul Turkey
| | | | - Vladimir N. Uversky
- Department of Molecular Medicine USF Health Byrd Alzheimer's Research Institute Morsani College of Medicine University of South Florida Tampa FL USA
- Laboratory of New Methods in Biology Institute for Biological Instrumentation of the Russian Academy of Sciences Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences” Pushchino Russia
| | | |
Collapse
|
224
|
O’Brien KT, Mooney C, Lopez C, Pollastri G, Shields DC. Prediction of polyproline II secondary structure propensity in proteins. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191239. [PMID: 32218953 PMCID: PMC7029904 DOI: 10.1098/rsos.191239] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 12/04/2019] [Indexed: 05/29/2023]
Abstract
Background: The polyproline II helix (PPIIH) is an extended protein left-handed secondary structure that usually but not necessarily involves prolines. Short PPIIHs are frequently, but not exclusively, found in disordered protein regions, where they may interact with peptide-binding domains. However, no readily usable software is available to predict this state. Results: We developed PPIIPRED to predict polyproline II helix secondary structure from protein sequences, using bidirectional recurrent neural networks trained on known three-dimensional structures with dihedral angle filtering. The performance of the method was evaluated in an external validation set. In addition to proline, PPIIPRED favours amino acids whose side chains extend from the backbone (Leu, Met, Lys, Arg, Glu, Gln), as well as Ala and Val. Utility for individual residue predictions is restricted by the rarity of the PPIIH feature compared to structurally common features. Conclusion: The software, available at http://bioware.ucd.ie/PPIIPRED, is useful in large-scale studies, such as evolutionary analyses of PPIIH, or computationally reducing large datasets of candidate binding peptides for further experimental validation.
Collapse
Affiliation(s)
- Kevin T. O’Brien
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Catherine Mooney
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Cyril Lopez
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Dublin, Ireland
- Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Denis C. Shields
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| |
Collapse
|
225
|
Oldfield CJ, Fan X, Wang C, Dunker AK, Kurgan L. Computational Prediction of Intrinsic Disorder in Protein Sequences with the disCoP Meta-predictor. Methods Mol Biol 2020; 2141:21-35. [PMID: 32696351 DOI: 10.1007/978-1-0716-0524-0_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Intrinsically disordered proteins are either entirely disordered or contain disordered regions in their native state. These proteins and regions function without the prerequisite of a stable structure and were found to be abundant across all kingdoms of life. Experimental annotation of disorder lags behind the rapidly growing number of sequenced proteins, motivating the development of computational methods that predict disorder in protein sequences. DisCoP is a user-friendly webserver that provides accurate sequence-based prediction of protein disorder. It relies on meta-architecture in which the outputs generated by multiple disorder predictors are combined together to improve predictive performance. The architecture of disCoP is presented, and its accuracy relative to several other disorder predictors is briefly discussed. We describe usage of the web interface and explain how to access and read results generated by this computational tool. We also provide an example of prediction results and interpretation. The disCoP's webserver is publicly available at http://biomine.cs.vcu.edu/servers/disCoP/ .
Collapse
Affiliation(s)
| | - Xiao Fan
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Chen Wang
- Department of Medicine, Columbia University, New York, NY, USA
| | - A Keith Dunker
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
226
|
Oldfield CJ, Peng Z, Uversky VN, Kurgan L. Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins. Cell Mol Life Sci 2020; 77:149-160. [PMID: 31175370 PMCID: PMC11104855 DOI: 10.1007/s00018-019-03166-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 05/14/2019] [Accepted: 05/28/2019] [Indexed: 02/06/2023]
Abstract
Protein-coding nucleic acids exhibit composition and codon biases between sequences coding for intrinsically disordered regions (IDRs) and those coding for structured regions. IDRs are regions of proteins that are folding self-insufficient and which function without the prerequisite of folded structure. Several authors have investigated composition bias or codon selection in regions encoding for IDRs, primarily in Eukaryota, and concluded that elevated GC content is the result of the biased amino acid composition of IDRs. We substantively extend previous work by examining GC content in regions encoding IDRs, from 44 species in Eukaryota, Archaea, and Bacteria, spanning a wide range of GC content. We confirm that regions coding for IDRs show a significantly elevated GC content, even across all domains of life. Although this is largely attributable to the amino acid composition bias of IDRs, we show that this bias is independent of the overall GC content and, most importantly, we are the first to observe that GC content bias in IDRs is significantly different than expected from IDR amino acid composition alone. We empirically find compensatory codon selection that reduces the observed GC content bias in IDRs. This selection is dependent on the overall GC content of the organism. The codon selection bias manifests as use of infrequent, AT-rich codons in encoding IDRs. Further, we find these relationships to be independent of the intrinsic disorder prediction method used, and independent of estimated translation efficiency. These observations are consistent with the previous work, and we speculate on whether the observed biases are causal or symptomatic of other driving forces.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| |
Collapse
|
227
|
Abstract
Intrinsically disordered proteins (IDPs) and regions (IDRs) are commonly found in all proteomes analyzed so far. These proteins/regions are subject to numerous posttranslational modifications (PTMs) and alternative splicing, are involved in a wide range of cellular functions, and often facilitate protein-protein interactions (PPIs). Some of these proteins contain molecular recognition features (MoRFs), which are IDRs that bind to partner proteins and undergo disorder-to-order transitions. Although many IDPs/IDRs can fold upon binding, a large fraction of these proteins are known to maintain significant amounts of disorder in their bound states. Being well-recognized interaction specialists, IDPs/IDRs can participate in one-to-many and many-to-one interactions, where one IDP/IDR binds to multiple partners potentially gaining very different structures in the bound state, or where multiple unrelated IDPs/IDRs bind to one partner. As a result, IDPs frequently serve as hubs (i.e., proteins with many links) in complex PPI networks. The goal of this chapter is to describe computational and bioinformatics tools that can be used to look at the disorder status of proteins within a given PPI network and also to gain some knowledge on the disorder-based functionality of the members of this network. To this end, description is provided for some of the use of UniProt and DisProt databases, several databases generating PPI networks (BioGRID, IntAct, DIP, MINT, HPRD, APID, KEGG, and STRING), Composition profiler, some tools for the per-residue disorder predictions (PONDR® VLXT, PONDR® VL3, PONDR® VSL2, PONDR-FIT, and IUPred), binary disorder classifiers CH-plot and CDF-plot and their combined CH-CDF analysis, web-based tools for the visualization of disorder distribution in a query protein (D2P2 and MobiDB), as well as some tools for evaluation disorder-based functionality of proteins (ANCHOR, MoRFpred, DEPP, and ModPred).
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA. .,USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA. .,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation.
| |
Collapse
|
228
|
Abstract
Intrinsically disordered regions (IDRs) are estimated to be highly abundant in nature. While only several thousand proteins are annotated with experimentally derived IDRs, computational methods can be used to predict IDRs for the millions of currently uncharacterized protein chains. Several dozen disorder predictors were developed over the last few decades. While some of these methods provide accurate predictions, unavoidably they also make some mistakes. Consequently, one of the challenges facing users of these methods is how to decide which predictions can be trusted and which are likely incorrect. This practical problem can be solved using quality assessment (QA) scores that predict correctness of the underlying (disorder) predictions at a residue level. We motivate and describe a first-of-its-kind toolbox of QA methods, QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions), which provides the scores for a diverse set of ten disorder predictors. QUARTER is available to the end users as a free and convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTER/ . We briefly describe the predictive architecture of QUARTER and provide detailed instructions on how to use the webserver. We also explain how to interpret results produced by QUARTER with the help of a case study.
Collapse
|
229
|
Katuwawala A, Oldfield CJ, Kurgan L. DISOselect: Disorder predictor selection at the protein level. Protein Sci 2020; 29:184-200. [PMID: 31642118 PMCID: PMC6933862 DOI: 10.1002/pro.3756] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/27/2022]
Abstract
The intense interest in the intrinsically disordered proteins in the life science community, together with the remarkable advancements in predictive technologies, have given rise to the development of a large number of computational predictors of intrinsic disorder from protein sequence. While the growing number of predictors is a positive trend, we have observed a considerable difference in predictive quality among predictors for individual proteins. Furthermore, variable predictor performance is often inconsistent between predictors for different proteins, and the predictor that shows the best predictive performance depends on the unique properties of each protein sequence. We propose a computational approach, DISOselect, to estimate the predictive performance of 12 selected predictors for individual proteins based on their unique sequence-derived properties. This estimation informs the users about the expected predictive quality for a selected disorder predictor and can be used to recommend methods that are likely to provide the best quality predictions. Our solution does not depend on the results of any disorder predictor; the estimations are made based solely on the protein sequence. Our solution significantly improves predictive performance, as judged with a test set of 1,000 proteins, when compared to other alternatives. We have empirically shown that by using the recommended methods the overall predictive performance for a given set of proteins can be improved by a statistically significant margin. DISOselect is freely available for non-commercial users through the webserver at http://biomine.cs.vcu.edu/servers/DISOselect/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| | | | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| |
Collapse
|
230
|
Rodriguez G, Orris B, Majumdar A, Bhat S, Stivers JT. Macromolecular crowding induces compaction and DNA binding in the disordered N-terminal domain of hUNG2. DNA Repair (Amst) 2019; 86:102764. [PMID: 31855846 DOI: 10.1016/j.dnarep.2019.102764] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 11/25/2019] [Accepted: 12/04/2019] [Indexed: 11/15/2022]
Abstract
Many human DNA repair proteins have disordered domains at their N- or C-termini with poorly defined biological functions. We recently reported that the partially structured N-terminal domain (NTD) of human uracil DNA glycosylase 2 (hUNG2), functions to enhance DNA translocation in crowded environments and also targets the enzyme to single-stranded/double-stranded DNA junctions. To understand the structural basis for these effects we now report high-resolution heteronuclear NMR studies of the isolated NTD in the presence and absence of an inert macromolecular crowding agent (PEG8K). Compared to dilute buffer, we find that crowding reduces the degrees of freedom for the structural ensemble, increases the order of a PCNA binding motif and dramatically promotes binding of the NTD for DNA through a conformational selection mechanism. These findings shed new light on the function of this disordered domain in the context of the crowded nuclear environment.
Collapse
Affiliation(s)
- Gaddiel Rodriguez
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - Benjamin Orris
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - Ananya Majumdar
- Biomolecular NMR Center, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Shridhar Bhat
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - James T Stivers
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States.
| |
Collapse
|
231
|
Dubreuil B, Matalon O, Levy ED. Protein Abundance Biases the Amino Acid Composition of Disordered Regions to Minimize Non-functional Interactions. J Mol Biol 2019; 431:4978-4992. [PMID: 31442477 PMCID: PMC6941228 DOI: 10.1016/j.jmb.2019.08.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/07/2019] [Accepted: 08/10/2019] [Indexed: 02/07/2023]
Abstract
In eukaryotes, disordered regions cover up to 50% of proteomes and mediate fundamental cellular processes. In contrast to globular domains, where about half of the amino acids are buried in the protein interior, disordered regions show higher solvent accessibility, which makes them prone to engage in non-functional interactions. Such interactions are exacerbated by the law of mass action, prompting the question of how they are minimized in abundant proteins. We find that interaction propensity or "stickiness" of disordered regions negatively correlates with their cellular abundance, both in yeast and human. Strikingly, considering yeast proteins where a large fraction of the sequence is disordered, the correlation between stickiness and abundance reaches R=-0.55. Beyond this global amino-acid composition bias, we identify three rules by which amino-acid composition of disordered regions adjusts with high abundance. First, lysines are preferred over arginines, consistent with the latter amino acid being stickier than the former. Second, compensatory effects exist, whereby a sticky region can be tolerated if it is compensated by a distal non-sticky region. Third, such compensation requires a lower average stickiness at the same abundance when compared to a scenario where stickiness is homogeneous throughout the sequence. We validate these rules experimentally, employing them as different strategies to rescue an otherwise sticky protein fragment from aggregation. Our results highlight that non-functional interactions represent a significant constraint in cellular systems and reveal simple rules by which protein sequences adapt to that constraint. Data from this work are deposited in Figshare, at https://doi.org/10.6084/m9.figshare.8068937.v3.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Or Matalon
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.
| |
Collapse
|
232
|
Choura M, Rebaï A, Hanin M. Proteome-wide analysis of protein disorder in Triticum aestivum and Hordeum vulgare. Comput Biol Chem 2019; 84:107138. [PMID: 31767506 DOI: 10.1016/j.compbiolchem.2019.107138] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 09/10/2019] [Accepted: 10/01/2019] [Indexed: 11/15/2022]
Abstract
There has been an increasing interest in Intrinsically Disordered Proteins (IDPs) ever since it was proven that they are ubiquitous and involved in key cellular functions. Interestingly, they have shown a large abundance in complete proteomes. In the current study, we have investigated the first large-scale study of the repertoire of IDPs in Triticum aestivum and Hordeum vulgare proteomes, in order to get insight into the biological roles of IDPs in both species. Results show that proteins in T. aestivum are significantly more disordered than those of H. vulgare. Moreover, the data revealed that DNA/RNA binding domains, co-factors, heme, metal ions binding domains, ATP/GTP binding proteins, ligands, linker domains and repeats, other domains typical to transcription factors such as zinc finger, F-box domain, homeodomain-like, l-domain like and chaperones, are predominantly present and co-occur in disordered proteins in T.aestivum and H.vulgare. The Gene Ontology analysis revealed that IDPs in T. aestivum and H. vulgare are mainly involved in regulation of cellular and biological processes up on response to stress. In future, this study may provide valuable information while considering IDPs in understanding the organism complexity and environmental adaptation.
Collapse
Affiliation(s)
- Mouna Choura
- Biotechnology and Plant Improvement Laboratory, Center of Biotechnology of Sfax, University of Sfax, Route Sidi Mansour Km 6, P.O. Box 1177, 3018, Sfax, Tunisia.
| | - Ahmed Rebaï
- Laboratory of Molecular and Cellular Screening Processes, Center of Biotechnology of Sfax, University of Sfax, Route Sidi Mansour Km 6, P.O. Box 1177, 3018, Sfax, Tunisia.
| | - Moez Hanin
- Biotechnology and Plant Improvement Laboratory, Center of Biotechnology of Sfax, University of Sfax, Route Sidi Mansour Km 6, P.O. Box 1177, 3018, Sfax, Tunisia; Plant Physiology and Functional Genomics Research Unit, Institute of Biotechnology, University of Sfax, BP "1175" , 3038 Sfax, Tunisia; Institute of Biotechnology, University of Sfax. BP "1175", 3038, Sfax, Tunisia.
| |
Collapse
|
233
|
Zsolyomi F, Ambrus V, Fuxreiter M. Patterns of Dynamics Comprise a Conserved Evolutionary Trait. J Mol Biol 2019; 432:497-507. [PMID: 31783068 DOI: 10.1016/j.jmb.2019.11.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 11/04/2019] [Accepted: 11/13/2019] [Indexed: 11/30/2022]
Abstract
The importance of protein dynamics in function may suggest an evolutionary selection on large-scale protein motions. Here we systematically studied the dynamic characteristics in 2221 protein domains (58477 sequences) of the Pfam database. We defined the patterns of dynamics (PODs) based on the estimated NMR order parameters and the predicted degree of disorder, and found a significant correlation between them in families of both structured and disordered protein domains. We demonstrate that conservation of dynamic patterns frequently exceeds conservation of sequence and is comparable to the patterns of hydropathy and nonspecific interaction potential. Similarity of dynamic patterns is weakly correlated to structure similarity and to the degree of disorder. We illustrate that POD alignments could be applied to sequentially divergent or intrinsically disordered regions. We propose that patterns of dynamics comprise a conserved evolutionary trait, which could be used to infer evolutionary relationships as an alternative to sequence and structure.
Collapse
Affiliation(s)
- F Zsolyomi
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Hungary
| | - V Ambrus
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Hungary
| | - M Fuxreiter
- MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, Hungary.
| |
Collapse
|
234
|
Raimondi D, Orlando G, Vranken WF, Moreau Y. Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci Rep 2019; 9:16932. [PMID: 31729443 PMCID: PMC6858301 DOI: 10.1038/s41598-019-53324-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 10/25/2019] [Indexed: 11/21/2022] Open
Abstract
Machine learning (ML) is ubiquitous in bioinformatics, due to its versatility. One of the most crucial aspects to consider while training a ML model is to carefully select the optimal feature encoding for the problem at hand. Biophysical propensity scales are widely adopted in structural bioinformatics because they describe amino acids properties that are intuitively relevant for many structural and functional aspects of proteins, and are thus commonly used as input features for ML methods. In this paper we reproduce three classical structural bioinformatics prediction tasks to investigate the main assumptions about the use of propensity scales as input features for ML methods. We investigate their usefulness with different randomization experiments and we show that their effectiveness varies among the ML methods used and the tasks. We show that while linear methods are more dependent on the feature encoding, the specific biophysical meaning of the features is less relevant for non-linear methods. Moreover, we show that even among linear ML methods, the simpler one-hot encoding can surprisingly outperform the “biologically meaningful” scales. We also show that feature selection performed with non-linear ML methods may not be able to distinguish between randomized and “real” propensity scales by properly prioritizing to the latter. Finally, we show that learning problem-specific embeddings could be a simple, assumptions-free and optimal way to perform feature learning/engineering for structural bioinformatics tasks.
Collapse
Affiliation(s)
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050, Brussels, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, 3001, Leuven, Belgium.
| |
Collapse
|
235
|
Heidari-Japelaghi R, Haddad R, Valizadeh M, Dorani-Uliaie E, Jalali-Javaran M. Elastin-like polypeptide fusions for high-level expression and purification of human IFN-γ in Escherichia coli. Anal Biochem 2019; 585:113401. [DOI: 10.1016/j.ab.2019.113401] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 07/23/2019] [Accepted: 08/19/2019] [Indexed: 01/18/2023]
|
236
|
El Hadidy N, Uversky VN. Intrinsic Disorder of the BAF Complex: Roles in Chromatin Remodeling and Disease Development. Int J Mol Sci 2019; 20:ijms20215260. [PMID: 31652801 PMCID: PMC6862534 DOI: 10.3390/ijms20215260] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/12/2019] [Accepted: 10/21/2019] [Indexed: 12/13/2022] Open
Abstract
The two-meter-long DNA is compressed into chromatin in the nucleus of every cell, which serves as a significant barrier to transcription. Therefore, for processes such as replication and transcription to occur, the highly compacted chromatin must be relaxed, and the processes required for chromatin reorganization for the aim of replication or transcription are controlled by ATP-dependent nucleosome remodelers. One of the most highly studied remodelers of this kind is the BRG1- or BRM-associated factor complex (BAF complex, also known as SWItch/sucrose non-fermentable (SWI/SNF) complex), which is crucial for the regulation of gene expression and differentiation in eukaryotes. Chromatin remodeling complex BAF is characterized by a highly polymorphic structure, containing from four to 17 subunits encoded by 29 genes. The aim of this paper is to provide an overview of the role of BAF complex in chromatin remodeling and also to use literature mining and a set of computational and bioinformatics tools to analyze structural properties, intrinsic disorder predisposition, and functionalities of its subunits, along with the description of the relations of different BAF complex subunits to the pathogenesis of various human diseases.
Collapse
Affiliation(s)
- Nashwa El Hadidy
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL 33612, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL 33612, USA.
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, 142290 Moscow Region, Russia.
| |
Collapse
|
237
|
Davey NE, Babu MM, Blackledge M, Bridge A, Capella-Gutierrez S, Dosztanyi Z, Drysdale R, Edwards RJ, Elofsson A, Felli IC, Gibson TJ, Gutmanas A, Hancock JM, Harrow J, Higgins D, Jeffries CM, Le Mercier P, Mészáros B, Necci M, Notredame C, Orchard S, Ouzounis CA, Pancsa R, Papaleo E, Pierattelli R, Piovesan D, Promponas VJ, Ruch P, Rustici G, Romero P, Sarntivijai S, Saunders G, Schuler B, Sharan M, Shields DC, Sussman JL, Tedds JA, Tompa P, Turewicz M, Vondrasek J, Vranken WF, Wallace BA, Wichapong K, Tosatto SCE. An intrinsically disordered proteins community for ELIXIR. F1000Res 2019; 8. [PMID: 31824649 PMCID: PMC6880265 DOI: 10.12688/f1000research.20136.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/18/2019] [Indexed: 01/20/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled “An intrinsically disordered protein user community proposal for ELIXIR” held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Collapse
Affiliation(s)
- Norman E Davey
- Division of Cancer Biology, Institute of Cancer Research, UK, London, SW3 6JB, UK
| | - M Madan Babu
- MRC Laboratory of Molecular Biology,, Cambridge, CB2 0QH, UK
| | - Martin Blackledge
- Institut de Biologie Structurale, Université Grenoble Alpes, Grenoble, 38000, France
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Zsuzsanna Dosztanyi
- Department of Biochemistry, Eötvös Loránd University, Budapest, H-1117, Hungary
| | | | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Isabella C Felli
- Department of Chemistry and CERM "Ugo Schiff", University of Florence, Florence, Italy
| | - Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Aleksandras Gutmanas
- Protein Data Bank in Europe, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - John M Hancock
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Jen Harrow
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Desmond Higgins
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin, D4, Ireland
| | - Cy M Jeffries
- European Molecular Biology Laboratory, Hamburg, Germany
| | - Philippe Le Mercier
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Balint Mészáros
- Department of Biochemistry, Eötvös Loránd University, Budapest, H-1117, Hungary
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Cedric Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - Christos A Ouzounis
- BCPL-CPERI, Centre for Research & Technology Hellas (CERTH), Thessalonica, 57001, Greece
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest, H-1117, Hungary
| | - Elena Papaleo
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, 2100, Denmark
| | - Roberta Pierattelli
- Department of Chemistry and CERM "Ugo Schiff", University of Florence, Florence, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, CY-1678, Cyprus
| | - Patrick Ruch
- HES-SO/HEG and SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Gabriella Rustici
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Pedro Romero
- University of Wisconsin-Madison, Madison, WI, 53706-1544, USA
| | | | - Gary Saunders
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Benjamin Schuler
- Department of Biochemistry, University of Zurich, Zurich, Switzerland
| | - Malvika Sharan
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Denis C Shields
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin, D4, Ireland
| | - Joel L Sussman
- Department of Structural Biology and the Israel Structural Proteomics, Center (ISPC), Weizmann Institute of Science, Reḥovot, 7610001, Israel
| | | | - Peter Tompa
- VIB Center for Structural Biology (CSB), VIB Flemish Institute for Biotechnology, Brussels, 1050, Belgium
| | - Michael Turewicz
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, GesundheitsCampus 4, Bochum, 44801, Germany
| | - Jiri Vondrasek
- Institute of Organic Chemistry and Biochemistry, CAS, Prague, Czech Republic
| | - Wim F Vranken
- VUB/ULB Interuniversity Institute of Bioinformatics in Brussels and Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, B-1050, Belgium
| | - Bonnie Ann Wallace
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, WC1H 0HA, UK
| | - Kanin Wichapong
- Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
| | | |
Collapse
|
238
|
Katuwawala A, Oldfield CJ, Kurgan L. Accuracy of protein-level disorder predictions. Brief Bioinform 2019; 21:1509-1522. [DOI: 10.1093/bib/bbz100] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/22/2019] [Accepted: 07/15/2019] [Indexed: 01/15/2023] Open
Abstract
Abstract
Experimental annotations of intrinsic disorder are available for 0.1% of 147 000 000 of currently sequenced proteins. Over 60 sequence-based disorder predictors were developed to help bridge this gap. Current benchmarks of these methods assess predictive performance on datasets of proteins; however, predictions are often interpreted for individual proteins. We demonstrate that the protein-level predictive performance varies substantially from the dataset-level benchmarks. Thus, we perform first-of-its-kind protein-level assessment for 13 popular disorder predictors using 6200 disorder-annotated proteins. We show that the protein-level distributions are substantially skewed toward high predictive quality while having long tails of poor predictions. Consequently, between 57% and 75% proteins secure higher predictive performance than the currently used dataset-level assessment suggests, but as many as 30% of proteins that are located in the long tails suffer low predictive performance. These proteins typically have relatively high amounts of disorder, in contrast to the mostly structured proteins that are predicted accurately by all 13 methods. Interestingly, each predictor provides the most accurate results for some number of proteins, while the best-performing at the dataset-level method is in fact the best for only about 30% of proteins. Moreover, the majority of proteins are predicted more accurately than the dataset-level performance of the most accurate tool by at least four disorder predictors. While these results suggests that disorder predictors outperform their current benchmark performance for the majority of proteins and that they complement each other, novel tools that accurately identify the hard-to-predict proteins and that make accurate predictions for these proteins are needed.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, USA
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Christopher J Oldfield
- Department of Computer Science, Virginia Commonwealth University, USA
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
- Department of Computer Science, Virginia Commonwealth University, USA
| |
Collapse
|
239
|
Trushina NI, Bakota L, Mulkidjanian AY, Brandt R. The Evolution of Tau Phosphorylation and Interactions. Front Aging Neurosci 2019; 11:256. [PMID: 31619983 PMCID: PMC6759874 DOI: 10.3389/fnagi.2019.00256] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 08/28/2019] [Indexed: 12/18/2022] Open
Abstract
Tau is a neuronal microtubule-associated protein (MAP) that is involved in the regulation of axonal microtubule assembly. However, as a protein with intrinsically disordered regions (IDRs), tau also interacts with many other partners in addition to microtubules. Phosphorylation at selected sites modulates tau's various intracellular interactions and regulates the properties of IDRs. In Alzheimer's disease (AD) and other tauopathies, tau exhibits pathologically increased phosphorylation (hyperphosphorylation) at selected sites and aggregates into neurofibrillary tangles (NFTs). By bioinformatics means, we tested the hypothesis that the sequence of tau has changed during the vertebrate evolution in a way that novel interactions developed and also the phosphorylation pattern was affected, which made tau prone to the development of tauopathies. We report that distinct regions of tau show functional specialization in their molecular interactions. We found that tau's amino-terminal region, which is involved in biological processes related to "membrane organization" and "regulation of apoptosis," exhibited a strong evolutionary increase in protein disorder providing the basis for the development of novel interactions. We observed that the predicted phosphorylation sites have changed during evolution in a region-specific manner, and in some cases the overall number of phosphorylation sites increased owing to the formation of clusters of phosphorylatable residues. In contrast, disease-specific hyperphosphorylated sites remained highly conserved. The data indicate that novel, non-microtubule related tau interactions developed during evolution and suggest that the biological processes, which are mediated by these interactions, are of pathological relevance. Furthermore, the data indicate that predicted phosphorylation sites in some regions of tau, including a cluster of phosphorylatable residues in the alternatively spliced exon 2, have changed during evolution. In view of the "antagonistic pleiotropy hypothesis" it may be worth to take disease-associated phosphosites with low evolutionary conservation as relevant biomarkers into consideration.
Collapse
Affiliation(s)
| | - Lidia Bakota
- Department of Neurobiology, University of Osnabrück, Osnabrück, Germany
| | - Armen Y Mulkidjanian
- Department of Physics, University of Osnabrück, Osnabrück, Germany.,School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.,A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Roland Brandt
- Department of Neurobiology, University of Osnabrück, Osnabrück, Germany.,Center for Cellular Nanoanalytics, University of Osnabrück, Osnabrück, Germany.,Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
240
|
Dragićević MB, Paunović DM, Bogdanović MD, Todorović SI, Simonović AD. ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R. Glycobiology 2019; 30:cwz072. [PMID: 31508799 DOI: 10.1093/glycob/cwz072] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 06/19/2019] [Accepted: 08/29/2019] [Indexed: 11/12/2022] Open
Abstract
Hydroxyproline-rich glycoproteins (HRGPs) are one of the most complex families of macromolecules found in plants, due to the diversity of glycans decorating the protein backbone, as well as the heterogeneity of the protein backbones. While this diversity is responsible for a wide array of physiological functions associated with HRGPs, it hinders attempts for homology based identification. Current approaches, based on identifying sequences with characteristic motifs and biased amino acid composition, are limited to prototypical sequences. Ragp is an R package for mining and analysis of HRGPs, with emphasis on arabinogalactan proteins. The ragp filtering pipeline exploits one of the HRGPs key features, the presence of hydroxyprolines which represent glycosylation sites. Main package features include prediction of proline hydroxylation sites, amino acid motif and bias analyses, efficient communication with web servers for prediction of N-terminal signal peptides, glycosylphosphatidylinositol modification sites and disordered regions and the ability to annotate sequences through hmmscan and subsequent GO enrichment, based on predicted Pfam domains. As such, ragp extends R's rich ecosystem for high-throughput sequence data analyses. The ragp R package is available under the MIT Open Source license and is freely available to download from GitHub at: https://github.com/missuse/ragp.
Collapse
Affiliation(s)
- Milan B Dragićević
- Institute for Biological Research"Siniša Stanković", Department of Plant Physiology, Bul. Despota Stefana 142, University of Belgrade, 11000 Belgrade, Serbia
| | - Danijela M Paunović
- Institute for Biological Research"Siniša Stanković", Department of Plant Physiology, Bul. Despota Stefana 142, University of Belgrade, 11000 Belgrade, Serbia
| | - Milica D Bogdanović
- Institute for Biological Research"Siniša Stanković", Department of Plant Physiology, Bul. Despota Stefana 142, University of Belgrade, 11000 Belgrade, Serbia
| | - Slađana I Todorović
- Institute for Biological Research"Siniša Stanković", Department of Plant Physiology, Bul. Despota Stefana 142, University of Belgrade, 11000 Belgrade, Serbia
| | - Ana D Simonović
- Institute for Biological Research"Siniša Stanković", Department of Plant Physiology, Bul. Despota Stefana 142, University of Belgrade, 11000 Belgrade, Serbia
| |
Collapse
|
241
|
Piovesan D, Tabaro F, Paladin L, Necci M, Micetic I, Camilloni C, Davey N, Dosztányi Z, Mészáros B, Monzon AM, Parisi G, Schad E, Sormanni P, Tompa P, Vendruscolo M, Vranken WF, Tosatto SCE. MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins. Nucleic Acids Res 2019; 46:D471-D476. [PMID: 29136219 PMCID: PMC5753340 DOI: 10.1093/nar/gkx1071] [Citation(s) in RCA: 156] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 10/19/2017] [Indexed: 01/30/2023] Open
Abstract
The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Francesco Tabaro
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy.,Institute of Biosciences and Medical Technology, Arvo Ylpön katu 34, 33520 Tampere, Finland
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy.,Department of Agricultural Sciences, University of Udine, via Palladio 8, 33100 Udine, Italy.,Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige, Italy
| | - Ivan Micetic
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Carlo Camilloni
- Department of Biosciences, University of Milan, 20133 Milano, Italy
| | - Norman Davey
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.,UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, 1/c Pázmány Péter sétány, H-1117, Budapest, Hungary
| | - Bálint Mészáros
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, 1/c Pázmány Péter sétány, H-1117, Budapest, Hungary.,Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary
| | - Alexander M Monzon
- Structural Bioinformatics Group, Department of Science and Technology, National University of Quilmes, CONICET, Roque Saenz Pena 182, Bernal B1876BXD, Argentina
| | - Gustavo Parisi
- Structural Bioinformatics Group, Department of Science and Technology, National University of Quilmes, CONICET, Roque Saenz Pena 182, Bernal B1876BXD, Argentina
| | - Eva Schad
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary
| | - Pietro Sormanni
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Peter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary.,Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels 1050, Belgium.,VIB-VUB Center for Structural Biology, Flanders Institute for Biotechnology (VIB), Brussels 1050, Belgium
| | | | - Wim F Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels 1050, Belgium.,VIB-VUB Center for Structural Biology, Flanders Institute for Biotechnology (VIB), Brussels 1050, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, 1050 Brussels, Belgium
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy.,CNR Institute of Neuroscience, via U. Bassi 58/b, 35131 Padua, Italy
| |
Collapse
|
242
|
Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019; 20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open
Abstract
Intrinsically disordered proteins and regions are widely distributed in proteins, which are associated with many biological processes and diseases. Accurate prediction of intrinsically disordered proteins and regions is critical for both basic research (such as protein structure and function prediction) and practical applications (such as drug development). During the past decades, many computational approaches have been proposed, which have greatly facilitated the development of this important field. Therefore, a comprehensive and updated review is highly required. In this regard, we give a review on the computational methods for intrinsically disordered protein and region prediction, especially focusing on the recent development in this field. These computational approaches are divided into four categories based on their methodologies, including physicochemical-based method, machine-learning-based method, template-based method and meta method. Furthermore, their advantages and disadvantages are also discussed. The performance of 40 state-of-the-art predictors is directly compared on the target proteins in the task of disordered region prediction in the 10th Critical Assessment of protein Structure Prediction. A more comprehensive performance comparison of 45 different predictors is conducted based on seven widely used benchmark data sets. Finally, some open problems and perspectives are discussed.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| |
Collapse
|
243
|
Zamora-Briseño JA, Pereira-Santana A, Reyes-Hernández SJ, Castaño E, Rodríguez-Zapata LC. Global Dynamics in Protein Disorder during Maize Seed Development. Genes (Basel) 2019; 10:genes10070502. [PMID: 31262071 PMCID: PMC6678312 DOI: 10.3390/genes10070502] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 06/24/2019] [Accepted: 06/25/2019] [Indexed: 01/31/2023] Open
Abstract
Intrinsic protein disorder is a physicochemical attribute of some proteins lacking tridimensional structure and is collectively known as intrinsically disordered proteins (IDPs). Interestingly, several IDPs have been associated with protective functions in plants and with their response to external stimuli. To correlate the modulation of the IDPs content with the developmental progression in seed, we describe the expression of transcripts according to the disorder content of the proteins that they codify during seed development, from the early embryogenesis to the beginning of the desiccation tolerance acquisition stage. We found that the total expression profile of transcripts encoding for structured proteins is highly increased during middle phase. However, the relative content of protein disorder is increased as seed development progresses. We identified several intrinsically disordered transcription factors that seem to play important roles throughout seed development. On the other hand, we detected a gene cluster encoding for IDPs at the end of the late phase, which coincides with the beginning of the acquisition of desiccation tolerance. In conclusion, the expression pattern of IDPs is highly dependent on the developmental stage, and there is a general reduction in the expression of transcripts encoding for structured proteins as seed development progresses. We proposed maize seeds as a model to study the regulation of protein disorder in plant development and its involvement in the acquisition of desiccation tolerance in plants.
Collapse
Affiliation(s)
- Jesús Alejandro Zamora-Briseño
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Calle 43, número 130, Chuburná de Hidalgo, CP 97205, Mérida, Yucatán, México
| | - Alejandro Pereira-Santana
- Centro de Investigación y Asistencia en Tecnología y Diseño del estado de Jalisco. División de Biotecnología Industrial. Camino Arenero 1227, El Bajío, Zapopan, Jalisco. C.P. 45019
| | - Sandi Julissa Reyes-Hernández
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Calle 43, número 130, Chuburná de Hidalgo, CP 97205, Mérida, Yucatán, México
| | - Enrique Castaño
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, Calle 43, número 130, Chuburná de Hidalgo, CP 97205, Mérida, Yucatán, México
| | - Luis Carlos Rodríguez-Zapata
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Calle 43, número 130, Chuburná de Hidalgo, CP 97205, Mérida, Yucatán, México.
| |
Collapse
|
244
|
Han C, Cui C, Xing X, Lu Z, Zhang J, Liu J, Zhang Y. Functions of intrinsic disorder in proteins involved in DNA demethylation during pre-implantation embryonic development. Int J Biol Macromol 2019; 136:962-979. [PMID: 31229544 DOI: 10.1016/j.ijbiomac.2019.06.143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 06/18/2019] [Accepted: 06/19/2019] [Indexed: 01/21/2023]
Abstract
DNA demethylation is involved in many biological processes during pre-implantation embryonic development in mammals. To date, the complicated mechanism of DNA demethylation is still not fully understood. Ten-eleven translocation family (TET3, TET1 and TET2), thymine DNA glycosylase (TDG) and DNA methyltransferase 1 (DNMT1) are considered the major protein enzymes of DNA demethylation in pre-implantation embryos. TET3, TET1, TET2, TDG, and DNMT1 contain abundant levels of intrinsically disordered protein regions (IDPRs), which contribute to increasing the functional diversity of proteins. Thus we tried to explore the complicated DNA demethylation in pre-implantation embryos from the intrinsic disorder perspective. These five biological macromolecules all have DNA demethylation-related functional domains. They can work together to fulfill DNA demethylation in pre-implantation embryos through complex protein-protein interaction networks. Intrinsic disorder analysis results showed these proteins were partial intrinsically disordered proteins. Many identifiable disorder-based DNA-binding sites, protein-binding sites and post-translational modification sites located in the intrinsically disordered regions, and DNA demethylation deficiency point mutations in the IDPRs could significantly change the local disorder propensity of these proteins. To the best of our knowledge, this work provides a new viewpoint for studying the mechanism of DNA methylation reprogramming during mammalian pre-implantation embryonic development.
Collapse
Affiliation(s)
- Chengquan Han
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Chenchen Cui
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Xupeng Xing
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Zhenzhen Lu
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Jingcheng Zhang
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Jun Liu
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China.
| | - Yong Zhang
- Key Laboratory of Animal Biotechnology of the Ministry of Agriculture, College of Veterinary Medicine, Northwest A&F University, Yangling 712100, Shaanxi, China.
| |
Collapse
|
245
|
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]
Abstract
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
Collapse
|
246
|
Paladin L, Piovesan D, Tosatto SCE. SODA: prediction of protein solubility from disorder and aggregation propensity. Nucleic Acids Res 2019; 45:W236-W240. [PMID: 28505312 PMCID: PMC7059794 DOI: 10.1093/nar/gkx412] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 05/09/2017] [Indexed: 01/08/2023] Open
Abstract
Solubility is an important, albeit not well understood, feature determining protein behavior. It is of paramount importance in protein engineering, where similar folded proteins may behave in very different ways in solution. Here we present SODA, a novel method to predict the changes of protein solubility based on several physico-chemical properties of the protein. SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to estimate changes in solubility. It has been trained and benchmarked on two different datasets. The comparison to other recently published methods shows that SODA has state-of-the-art performance and is particularly well suited to predict mutations decreasing solubility. The method is fast, returning results for single mutations in seconds. A usage example estimating the full repertoire of mutations for a human germline antibody highlights several solubility hotspots on the surface. The web server, complete with RESTful interface and extensive help, can be accessed from URL: http://protein.bio.unipd.it/soda.
Collapse
Affiliation(s)
- Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, Viale G. Colombo 3, 35121 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Viale G. Colombo 3, 35121 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Viale G. Colombo 3, 35121 Padova, Italy.,CNR Institute of Neuroscience, Viale G. Colombo 3, 35121 Padova, Italy
| |
Collapse
|
247
|
Structural and functional impact of non-synonymous SNPs in the CST complex subunit TEN1: structural genomics approach. Biosci Rep 2019; 39:BSR20190312. [PMID: 31028137 PMCID: PMC6522806 DOI: 10.1042/bsr20190312] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 04/01/2019] [Accepted: 04/03/2019] [Indexed: 12/21/2022] Open
Abstract
TEN1 protein is a key component of CST complex, implicated in maintaining the telomere homeostasis, and provides stability to the eukaryotic genome. Mutations in TEN1 gene have higher chances of deleterious impact; thus, interpreting the number of mutations and their consequential impact on the structure, stability, and function is essentially important. Here, we have investigated the structural and functional consequences of nsSNPs in the TEN1 gene. A wide array of sequence- and structure-based computational prediction tools were employed to identify the effects of 78 nsSNPs on the structure and function of TEN1 protein and to identify the deleterious nsSNPs. These deleterious or destabilizing nsSNPs are scattered throughout the structure of TEN1. However, major mutations were observed in the α1-helix (12–16 residues) and β5-strand (88–96 residues). We further observed that mutations at the C-terminal region were having higher tendency to form aggregate. In-depth structural analysis of these mutations reveals that the pathogenicity of these mutations are driven mainly through larger structural changes because of alterations in non-covalent interactions. This work provides a blueprint to pinpoint the possible consequences of pathogenic mutations in the CST complex subunit TEN1.
Collapse
|
248
|
Orlando G, Raimondi D, Tabaro F, Codicè F, Moreau Y, Vranken WF. Computational identification of prion-like RNA-binding proteins that form liquid phase-separated condensates. Bioinformatics 2019; 35:4617-4623. [DOI: 10.1093/bioinformatics/btz274] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 04/06/2019] [Accepted: 04/12/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Eukaryotic cells contain different membrane-delimited compartments, which are crucial for the biochemical reactions necessary to sustain cell life. Recent studies showed that cells can also trigger the formation of membraneless organelles composed by phase-separated proteins to respond to various stimuli. These condensates provide new ways to control the reactions and phase-separation proteins (PSPs) are thus revolutionizing how cellular organization is conceived. The small number of experimentally validated proteins, and the difficulty in discovering them, remain bottlenecks in PSPs research.
Results
Here we present PSPer, the first in-silico screening tool for prion-like RNA-binding PSPs. We show that it can prioritize PSPs among proteins containing similar RNA-binding domains, intrinsically disordered regions and prions. PSPer is thus suitable to screen proteomes, identifying the most likely PSPs for further experimental investigation. Moreover, its predictions are fully interpretable in the sense that it assigns specific functional regions to the predicted proteins, providing valuable information for experimental investigation of targeted mutations on these regions. Finally, we show that it can estimate the ability of artificially designed proteins to form condensates (r=−0.87), thus providing an in-silico screening tool for protein design experiments.
Availability and implementation
PSPer is available at bio2byte.com/psp.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | | | - Francesco Tabaro
- Institute of Biosciences and Medical Technology, Tampere 33520, Finland
| | - Francesco Codicè
- Department of Computer Science and Engineering, University of Bologna, Bologna 40127, Italy
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Leuven 3001, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Center for Structural Biology, VIB, 1050 Brussels, Belgium
| |
Collapse
|
249
|
Improved measures for evolutionary conservation that exploit taxonomy distances. Nat Commun 2019; 10:1556. [PMID: 30952844 PMCID: PMC6450959 DOI: 10.1038/s41467-019-09583-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 03/19/2019] [Indexed: 11/30/2022] Open
Abstract
Selective pressures on protein-coding regions that provide fitness advantages can lead to the regions' fixation and conservation in genome duplications and speciation events. Consequently, conservation analyses relying on sequence similarities are exploited by a myriad of applications across all biosciences to identify functionally important protein regions. While very potent, existing conservation measures based on multiple sequence alignments are so pervasive that improvements to solutions of many problems have become incremental. We introduce a new framework for evolutionary conservation with measures that exploit taxonomy distances across species. Results show that our taxonomy-based framework comfortably outperforms existing conservation measures in identifying deleterious variants observed in the human population, including variants located in non-abundant sequence domains such as intrinsically disordered regions. The predictive power of our approach emphasizes that the phenotypic effects of sequence variants can be taxonomy-level specific and thus, conservation needs to be interpreted accordingly. Information on protein sequence variability and conservation can be leveraged to identify functionally important regions. Here, the authors develop new conservation measures that exploit taxonomy distances and LIST, a tool for predicting deleteriousness of human variants.
Collapse
|
250
|
Nielsen JT, Mulder FAA. Quality and bias of protein disorder predictors. Sci Rep 2019; 9:5137. [PMID: 30914747 PMCID: PMC6435736 DOI: 10.1038/s41598-019-41644-w] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/13/2019] [Indexed: 02/03/2023] Open
Abstract
Disorder in proteins is vital for biological function, yet it is challenging to characterize. Therefore, methods for predicting protein disorder from sequence are fundamental. Currently, predictors are trained and evaluated using data from X-ray structures or from various biochemical or spectroscopic data. However, the prediction accuracy of disordered predictors is not calibrated, nor is it established whether predictors are intrinsically biased towards one of the extremes of the order-disorder axis. We therefore generated and validated a comprehensive experimental benchmarking set of site-specific and continuous disorder, using deposited NMR chemical shift data. This novel experimental data collection is fully appropriate and represents the full spectrum of disorder. We subsequently analyzed the performance of 26 widely-used disorder prediction methods and found that these vary noticeably. At the same time, a distinct bias for over-predicting order was identified for some algorithms. Our analysis has important implications for the validity and the interpretation of protein disorder, as utilized, for example, in assessing the content of disorder in proteomes.
Collapse
Affiliation(s)
- Jakob T Nielsen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| | - Frans A A Mulder
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| |
Collapse
|