1
|
Sharma B, Mattaparthi VSK. Prediction of interface between regions of varying degrees of order or disorderness in intrinsically disordered proteins from dihedral angles. J Biomol Struct Dyn 2025; 43:3005-3015. [PMID: 38116756 DOI: 10.1080/07391102.2023.2294837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 12/06/2023] [Indexed: 12/21/2023]
Abstract
Intrinsically disordered proteins (IDPs) are proteins that do not form uniquely defined three-dimensional (3-D) structures. Experimental research on IDPs is difficult since they go against the traditional protein structure-function paradigm. Although there are several predictors of disorder based on amino acid sequences, but very limited based on the 3-D structures of proteins. Dihedral angles have a significant role in predicting protein structure because they establish a protein's backbone, which, coupled with its side chain, establishes its overall shape. Here, we have carried out atomistic Molecular Dynamics (MD) simulations on four different proteins: one ordered protein (Monellin), two partially disordered proteins (p53-TAD and Amyloid beta (Aβ1-42) peptide), and one completely disordered protein (Histatin 5). The MD simulation trajectories for the corresponding four proteins were used to conduct dihedral angle (ϕ and ѱ) analysis. Then, the average dihedral angles for each of the residues were calculated and plotted against the residue index. We noticed steep rises or falls in the average ϕ value at certain locations in the plot. These sudden shifts in the average ϕ value reflect the interface between regions of varying degrees of order or disorderness in intrinsically disordered proteins. Using this method, the probable conformer of a protein with a higher degree of disorder can be found among the ensembles of structures sampled during the MD simulations. The results of our study offer new understandings on precisely identifying regions of various degrees of disorder in intrinsically disordered proteins.
Collapse
Affiliation(s)
- Babli Sharma
- Molecular Modelling and Simulation Laboratory, Department of Molecular Biology and Biotechnology, Tezpur University, Assam, India
| | - Venkata Satish Kumar Mattaparthi
- Molecular Modelling and Simulation Laboratory, Department of Molecular Biology and Biotechnology, Tezpur University, Assam, India
| |
Collapse
|
2
|
Zhang O, Liu ZH, Forman-Kay JD, Head-Gordon T. Deep Learning of Proteins with Local and Global Regions of Disorder. ARXIV 2025:arXiv:2502.11326v2. [PMID: 40034137 PMCID: PMC11875298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Although machine learning has transformed protein structure prediction of folded protein ground states with remarkable accuracy, intrinsically disordered proteins and regions (IDPs/IDRs) are defined by diverse and dynamical structural ensembles that are predicted with low confidence by algorithms such as AlphaFold. We present a new machine learning method, IDPForge (Intrinsically Disordered Protein, FOlded and disordered Region GEnerator), that exploits a transformer protein language diffusion model to create all-atom IDP ensembles and IDR disordered ensembles that maintains the folded domains. IDPForge does not require sequence-specific training, back transformations from coarse-grained representations, nor ensemble reweighting, as in general the created IDP/IDR conformational ensembles show good agreement with solution experimental data, and options for biasing with experimental restraints are provided if desired. We envision that IDPForge with these diverse capabilities will facilitate integrative and structural studies for proteins that contain intrinsic disorder.
Collapse
|
3
|
Nawn D, Hassan SS, Hromić-Jahjefendić A, Bhattacharya T, Basu P, Redwan EM, Barh D, Andrade BS, Aljabali AA, Serrano-Aroca Á, Lundstrom K, Tambuwala MM, Uversky VN. Molecular genomic insights into melanoma associated proteins PRAME and BAP1. J Biomol Struct Dyn 2025:1-31. [PMID: 40084617 DOI: 10.1080/07391102.2025.2475228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 02/06/2025] [Indexed: 03/16/2025]
Abstract
Melanoma, a globally prevalent skin cancer with over 325,000 new cases annually, necessitates a comprehensive under- standing of its molecular components. This study looks at the PRAME (cutaneous melanoma-associated antigen) and BAP1 (gene controlling gene-environment interactions) proteins. Both PRAME and BAP1 are associated with critical genomic alterations that significantly influence melanoma progression and patient outcomes. PRAME is overexpressed in various cancers, especially uveal melanoma (UM), where high levels correlate with poor prognosis and genomic instability linked to chromosome 8q12 alterations. Meanwhile, mutations in BAP1 contribute to increased genomic instability and a higher risk of metastasis in UM, highlighting its importance as a key prognostic marker in tumorigenesis. Established approaches along with features proposed in this work are used to investigate sequence conservation, polyglutamic acid presence, intrinsic disorder of proteins, polar-nonpolar residues arrangement PRAME and BAP1 conserved residues highlight their critical roles in protein function and interaction. Sequence invariance indicates the possibility of functional relevance and evolutionary conservation. PRAME has enhanced intrinsic disorder and flexibility, whereas BAP1 has changed disorder-promoting residue sequences. Polyglutamic acid strings are found in both proteins, emphasizing their modulatory involvement in protein interactions. The ratios and spatial arrangement of amino acids have a profound influence on interactions and gene dysregulation. This work contributes to a better knowledge of the two melanoma-associated proteins viz. PRAME and BAP1 by unraveling their structural and functional complexities.
Collapse
Affiliation(s)
- Debaleena Nawn
- Department of Computer Science and Engineering, Adamas University, Jagannathpur, Kolkata, West Bengal, India
| | - Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur, West Bengal, India
| | - Altijana Hromić-Jahjefendić
- Department of Genetics and Bioengineering, Faculty of Engineering and Natural Sciences, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Tanishta Bhattacharya
- Developmental Genetics (Dept III), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Pallab Basu
- School of Physics, University of the Witwatersrand, Johannesburg, Braamfontein, South Africa
- Adjunct Faculty, Woxsen School of Sciences, Woxsen University, Hyderabad, Telangana, India
| | - Elrashdy M Redwan
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- Protein Research Department, Therapeutic and Protective Proteins Laboratory, Genetic Engineering and Biotechnology Research Institute, City of Scientific Research and Technological Applications, New Borg EL-Arab, Alexandria, Egypt
| | - Debmalya Barh
- Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, India
- Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Bruno Silva Andrade
- Department of Biological Sciences, Laboratory of Bioinformatics and Computational Chemistry, State University of Southwest of Bahia (UESB), Jequié, Brazil
| | - Alaa A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, Yarmouk University, Irbid, Jordan
| | - Ángel Serrano-Aroca
- Biomaterials and Bioengineering Lab, Centro de Investigación Traslacional San Alberto Magno, Universidad Católica de Valencia San Vicente Mártir, Valencia, Spain
| | | | | | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| |
Collapse
|
4
|
Xie J, Jin X, Wei H, Sun S, Liu Y. IDP-EDL: enhancing intrinsically disordered protein prediction by combining protein language model and ensemble deep learning. Brief Bioinform 2025; 26:bbaf182. [PMID: 40254833 PMCID: PMC12009716 DOI: 10.1093/bib/bbaf182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 02/26/2025] [Accepted: 03/30/2025] [Indexed: 04/22/2025] Open
Abstract
Identification of intrinsically disordered regions (IDRs) in proteins is essential for understanding fundamental cellular processes. The IDRs can be divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. In previous studies, most computational methods ignored the differences between LDRs and SDRs, and therefore failed to capture the different patterns of LDRs and SDRs. In this study, we propose IDP-EDL, an ensemble of three predictors. The component predictors were first built based on pretrained protein language model and applied task-specific fine-tuning for short, long, and generic disordered regions. A meta predictor was then trained to integrate three task-specific predictors into the final predictor. The results of experiments show that task-specific supervised fine-tuning can capture the different features of LDRs and SDRs and IDP-EDL can achieve stable performance on datasets with different ratios of LDRs and SDRs. More importantly, IDP-EDL can reach or even surpass state-of-the-art performance than other existing predictors on independent test sets. IDP-EDL is available at https://github.com/joestarXjx/IDP-EDL.
Collapse
Affiliation(s)
- Junxi Xie
- College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, South Campus: 266 Xinglong Section of Xifeng Road, Xi’an, Shaanxi 710126, North Campus: No. 2 South Taibai Road, Xi’an, Shaanxi 710071, China
| | - SaiSai Sun
- School of Computer Science and Technology, Xidian University, South Campus: 266 Xinglong Section of Xifeng Road, Xi’an, Shaanxi 710126, North Campus: No. 2 South Taibai Road, Xi’an, Shaanxi 710071, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China
| |
Collapse
|
5
|
Chakraborty A, Hussain A, Sabnam N. Uncovering the structural stability of Magnaporthe oryzae effectors: a secretome-wide in silico analysis. J Biomol Struct Dyn 2025; 43:1701-1722. [PMID: 38109060 DOI: 10.1080/07391102.2023.2292795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023]
Abstract
Rice blast, caused by the ascomycete fungus Magnaporthe oryzae, is a deadly disease and a major threat to global food security. The pathogen secretes small proteinaceous effectors, virulence factors, inside the host to manipulate and perturb the host immune system, allowing the pathogen to colonize and establish a successful infection. While the molecular functions of several effectors are characterized, very little is known about the structural stability of these effectors. We analyzed a total of 554 small secretory proteins (SSPs) from the M. oryzae secretome to decipher key features of intrinsic disorder (ID) and the structural dynamics of the selected putative effectors through thorough and systematic in silico studies. Our results suggest that out of the total SSPs, 66% were predicted as effector proteins, released either into the apoplast or cytoplasm of the host cell. Of these, 68% were found to be intrinsically disordered effector proteins (IDEPs). Among the six distinct classes of disordered effectors, we observed peculiar relationships between the localization of several effectors in the apoplast or cytoplasm and the degree of disorder. We determined the degree of structural disorder and its impact on protein foldability across all the putative small secretory effector proteins from the blast pathogen, further validated by molecular dynamics simulation studies. This study provides definite clues toward unraveling the mystery behind the importance of structural distortions in effectors and their impact on plant-pathogen interactions. The study of these dynamical segments may help identify new effectors as well.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Afzal Hussain
- Department of Bioinformatics, Maulana Azad National Institute of Technology, Bhopal, India
| | - Nazmiara Sabnam
- Department of Life Sciences, Presidency University, Kolkata, India
| |
Collapse
|
6
|
Zhang J, Zhou F, Liang X, Kurgan L. Accurate Prediction of Protein-Binding Residues in Protein Sequences Using SCRIBER. Methods Mol Biol 2025; 2867:247-260. [PMID: 39576586 DOI: 10.1007/978-1-0716-4196-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Deciphering molecular-level mechanisms that govern protein-protein interactions (PPIs) relies in part on the accurate prediction of protein-binding partners and protein-binding residues. These predictions can be used to support a wide spectrum of applications that include development of PPI networks and protein docking programs, drug design studies, and investigations of molecular details that underlie certain diseases. Computational methods that predict protein-binding residues offer convenient, inexpensive, and relatively accurate data that can aid these efforts. We introduce and describe a user-friendly webserver for the SCRIBER method that conveniently provides state-of-the-art predictions of protein-binding residues and that minimizes cross-predictions, i.e., incorrect prediction of residues that bind other/non-protein ligands as protein binding. SCRIBER relies on a two-layer architecture that is specifically designed to reduce the cross-predictions. We motivate and explain this predictive architecture. We describe how to use the webserver, interact with its web interface, and collect, read, and understand results generated by SCRIBER. The SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/ .
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China.
| | - Feng Zhou
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Xingchen Liang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
7
|
Zhao B, Basu S, Kurgan L. DescribePROT Database of Residue-Level Protein Structure and Function Annotations. Methods Mol Biol 2025; 2867:169-184. [PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
DescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data. Moreover, we briefly discuss plans for a future expansion of this database. DescribePROT is available at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ .
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
8
|
Peng Z, Wu H, Luo Y, Kurgan L. Prediction of Disordered Linkers Using APOD. Methods Mol Biol 2025; 2867:219-231. [PMID: 39576584 DOI: 10.1007/978-1-0716-4196-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered linkers (DLs) connect protein domains and structural elements within domains and facilitate allosteric regulation. Computational studies suggest that thousands of proteins have DLs. Since there are only about 250 proteins with manually curated DL annotations (DisProt database ver. 9.3), computational approaches that make accurate predictions of DLs from the protein sequences are essential for reducing this annotation gap. To this end, we recently released the Accurate Predictor Of DLs (APOD) method. Empirical tests show that APOD achieves Area Under the ROC Curve (AUC) of 0.82 and Matthews Correlation Coefficient (MCC) of 0.42 on a low-similarity test dataset. We implement APOD as a freely available and convenient web server at https://yanglab.qd.sdu.edu.cn/APOD/ . This web server takes a protein sequence as the input and outputs an easy-to-parse prediction result, with the entire prediction process done on the server side. We also provide a standalone version of APOD for users who want to process large datasets of sequences. This version must be installed and run locally on the end user's computer. In this chapter, we overview APOD, explain how to locate and use the web server and the standalone implementation, and discuss how to read and interpret APOD's outputs. We also demonstrate utility of APOD based on a case study protein.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China.
- Frontier Science Center for Nonlinear Expectations, Ministry of Education, Shandong University, Qingdao, China.
| | - Haiyan Wu
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Yuxian Luo
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
9
|
Zhang F, Kurgan L. Evaluation of predictions of disordered binding regions in the CAID2 experiment. Comput Struct Biotechnol J 2024; 27:78-88. [PMID: 39811792 PMCID: PMC11732247 DOI: 10.1016/j.csbj.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 12/12/2024] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open
Abstract
A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2's evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.
Collapse
Affiliation(s)
- Fuhao Zhang
- College of Information Engineering, Northwest A & F University, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
10
|
Mollaei P, Sadasivam D, Guntuboina C, Barati Farimani A. IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models. J Phys Chem B 2024; 128:12030-12037. [PMID: 39586094 DOI: 10.1021/acs.jpcb.4c02507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Danush Sadasivam
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Chakradhar Guntuboina
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
11
|
Desai H, Andrews KH, Bergersen KV, Ofori S, Yu F, Shikwana F, Arbing MA, Boatner LM, Villanueva M, Ung N, Reed EF, Nesvizhskii AI, Backus KM. Chemoproteogenomic stratification of the missense variant cysteinome. Nat Commun 2024; 15:9284. [PMID: 39468056 PMCID: PMC11519605 DOI: 10.1038/s41467-024-53520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 10/15/2024] [Indexed: 10/30/2024] Open
Abstract
Cancer genomes are rife with genetic variants; one key outcome of this variation is widespread gain-of-cysteine mutations. These acquired cysteines can be both driver mutations and sites targeted by precision therapies. However, despite their ubiquity, nearly all acquired cysteines remain unidentified via chemoproteomics; identification is a critical step to enable functional analysis, including assessment of potential druggability and susceptibility to oxidation. Here, we pair cysteine chemoproteomics-a technique that enables proteome-wide pinpointing of functional, redox sensitive, and potentially druggable residues-with genomics to reveal the hidden landscape of cysteine genetic variation. Our chemoproteogenomics platform integrates chemoproteomic, whole exome, and RNA-seq data, with a customized two-stage false discovery rate (FDR) error controlled proteomic search, which is further enhanced with a user-friendly FragPipe interface. Chemoproteogenomics analysis reveals that cysteine acquisition is a ubiquitous feature of both healthy and cancer genomes that is further elevated in the context of decreased DNA repair. Reference cysteines proximal to missense variants are also found to be pervasive, supporting heretofore untapped opportunities for variant-specific chemical probe development campaigns. As chemoproteogenomics is further distinguished by sample-matched combinatorial variant databases and is compatible with redox proteomics and small molecule screening, we expect widespread utility in guiding proteoform-specific biology and therapeutic discovery.
Collapse
Affiliation(s)
- Heta Desai
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Molecular Biology Institute, UCLA, Los Angeles, CA, USA
| | - Katrina H Andrews
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Kristina V Bergersen
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Samuel Ofori
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Flowreen Shikwana
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, USA
| | - Mark A Arbing
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- UCLA-DOE Institute for Genomics and Proteomics, UCLA, Los Angeles, CA, USA
| | - Lisa M Boatner
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, USA
| | - Miranda Villanueva
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Molecular Biology Institute, UCLA, Los Angeles, CA, USA
| | - Nicholas Ung
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Elaine F Reed
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Keriann M Backus
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Molecular Biology Institute, UCLA, Los Angeles, CA, USA.
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, USA.
- UCLA-DOE Institute for Genomics and Proteomics, UCLA, Los Angeles, CA, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
12
|
Malebary SJ, Alromema N. iDLB-Pred: identification of disordered lipid binding residues in protein sequences using convolutional neural network. Sci Rep 2024; 14:24724. [PMID: 39433833 PMCID: PMC11494137 DOI: 10.1038/s41598-024-75700-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 10/08/2024] [Indexed: 10/23/2024] Open
Abstract
Proteins, nucleic acids, and lipids all interact with intrinsically disordered protein areas. Lipid-binding regions are involved in a variety of biological processes as well as a number of human illnesses. The expanding body of experimental evidence for these interactions and the dearth of techniques to anticipate them from the protein sequence serve as driving forces. Although large-scale laboratory techniques are considered to be essential for equipment for studying binding residues, they are time consuming and costly, making it challenging for researchers to predict lipid binding residues. As a result, computational techniques are being looked at as a different strategy to overcome this difficulty. To predict disordered lipid-binding residues (DLBRs), we proposed iDLB-Pred predictor utilizing benchmark dataset to compute feature through extraction techniques to identify relevant patterns and information. Various classification techniques, including deep learning methods such as Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Multilayer Perceptrons (MLPs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs), were employed for model training. The proposed model, iDLB-Pred, was rigorously validated using metrics such as accuracy, sensitivity, specificity, and Matthew's correlation coefficient. The results demonstrate the predictor's exceptional performance, achieving accuracy rates of 81% on an independent dataset and 86% in 10-fold cross-validation.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia.
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia
| |
Collapse
|
13
|
Coskuner-Weber O. Structures prediction and replica exchange molecular dynamics simulations of α-synuclein: A case study for intrinsically disordered proteins. Int J Biol Macromol 2024; 276:133813. [PMID: 38996889 DOI: 10.1016/j.ijbiomac.2024.133813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/08/2024] [Accepted: 07/09/2024] [Indexed: 07/14/2024]
Abstract
In recent years, a variety of three-dimensional structure prediction tools, including AlphaFold2, AlphaFold3, I-TASSER, C-I-TASSER, Phyre2, ESMFold, and RoseTTAFold, have been employed in the investigation of intrinsically disordered proteins. However, a comprehensive validation of these tools specifically for intrinsically disordered proteins has yet to be conducted. In this study, we utilize AlphaFold2, AlphaFold3, I-TASSER, C-I-TASSER, Phyre2, ESMFold, and RoseTTAFold to predict the structure of a model intrinsically disordered α-synuclein protein. Additionally, extensive replica exchange molecular dynamics simulations of the intrinsically disordered protein are conducted. The resulting structures from both structure prediction tools and replica exchange molecular dynamics simulations are analyzed for radius of gyration, secondary and tertiary structure properties, as well as Cα and Hα chemical shift values. A comparison of the obtained results with experimental data reveals that replica exchange molecular dynamics simulations provide results in excellent agreement with experimental observations. However, none of the structure prediction tools utilized in this study can fully capture the structural characteristics of the model intrinsically disordered protein. This study shows that a cluster of ensembles are required for intrinsically disordered proteins. Artificial-intelligence based structure prediction tools such as AlphaFold3 and C-I-TASSER could benefit from stochastic sampling or Monte Carlo simulations for generating an ensemble of structures for intrinsically disordered proteins.
Collapse
Affiliation(s)
- Orkid Coskuner-Weber
- Turkish-German University, Molecular Biotechnology, Sahinkaya Caddesi, No. 106, Beykoz, Istanbul 34820, Turkey.
| |
Collapse
|
14
|
Jahn LR, Marquet C, Heinzinger M, Rost B. Protein embeddings predict binding residues in disordered regions. Sci Rep 2024; 14:13566. [PMID: 38866950 PMCID: PMC11169622 DOI: 10.1038/s41598-024-64211-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
The identification of protein binding residues helps to understand their biological processes as protein function is often defined through ligand binding, such as to other proteins, small molecules, ions, or nucleotides. Methods predicting binding residues often err for intrinsically disordered proteins or regions (IDPs/IDPRs), often also referred to as molecular recognition features (MoRFs). Here, we presented a novel machine learning (ML) model trained to specifically predict binding regions in IDPRs. The proposed model, IDBindT5, leveraged embeddings from the protein language model (pLM) ProtT5 to reach a balanced accuracy of 57.2 ± 3.6% (95% confidence interval). Assessed on the same data set, this did not differ at the 95% CI from the state-of-the-art (SOTA) methods ANCHOR2 and DeepDISOBind that rely on expert-crafted features and evolutionary information from multiple sequence alignments (MSAs). Assessed on other data, methods such as SPOT-MoRF reached higher MCCs. IDBindT5's SOTA predictions are much faster than other methods, easily enabling full-proteome analyses. Our findings emphasize the potential of pLMs as a promising approach for exploring and predicting features of disordered proteins. The model and a comprehensive manual are publicly available at https://github.com/jahnl/binding_in_disorder .
Collapse
Affiliation(s)
- Laura R Jahn
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany
| | - Céline Marquet
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany.
| | - Michael Heinzinger
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany
| | - Burkhard Rost
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
15
|
Ghosh D, Biswas A, Radhakrishna M. Advanced computational approaches to understand protein aggregation. BIOPHYSICS REVIEWS 2024; 5:021302. [PMID: 38681860 PMCID: PMC11045254 DOI: 10.1063/5.0180691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
Protein aggregation is a widespread phenomenon implicated in debilitating diseases like Alzheimer's, Parkinson's, and cataracts, presenting complex hurdles for the field of molecular biology. In this review, we explore the evolving realm of computational methods and bioinformatics tools that have revolutionized our comprehension of protein aggregation. Beginning with a discussion of the multifaceted challenges associated with understanding this process and emphasizing the critical need for precise predictive tools, we highlight how computational techniques have become indispensable for understanding protein aggregation. We focus on molecular simulations, notably molecular dynamics (MD) simulations, spanning from atomistic to coarse-grained levels, which have emerged as pivotal tools in unraveling the complex dynamics governing protein aggregation in diseases such as cataracts, Alzheimer's, and Parkinson's. MD simulations provide microscopic insights into protein interactions and the subtleties of aggregation pathways, with advanced techniques like replica exchange molecular dynamics, Metadynamics (MetaD), and umbrella sampling enhancing our understanding by probing intricate energy landscapes and transition states. We delve into specific applications of MD simulations, elucidating the chaperone mechanism underlying cataract formation using Markov state modeling and the intricate pathways and interactions driving the toxic aggregate formation in Alzheimer's and Parkinson's disease. Transitioning we highlight how computational techniques, including bioinformatics, sequence analysis, structural data, machine learning algorithms, and artificial intelligence have become indispensable for predicting protein aggregation propensity and locating aggregation-prone regions within protein sequences. Throughout our exploration, we underscore the symbiotic relationship between computational approaches and empirical data, which has paved the way for potential therapeutic strategies against protein aggregation-related diseases. In conclusion, this review offers a comprehensive overview of advanced computational methodologies and bioinformatics tools that have catalyzed breakthroughs in unraveling the molecular basis of protein aggregation, with significant implications for clinical interventions, standing at the intersection of computational biology and experimental research.
Collapse
Affiliation(s)
- Deepshikha Ghosh
- Department of Biological Sciences and Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Anushka Biswas
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | | |
Collapse
|
16
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
17
|
Pang Y, Liu B. DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model. BMC Biol 2024; 22:3. [PMID: 38166858 PMCID: PMC10762911 DOI: 10.1186/s12915-023-01803-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
Intrinsically disordered proteins and regions (IDPs/IDRs) are functionally important proteins and regions that exist as highly dynamic conformations under natural physiological conditions. IDPs/IDRs exhibit a broad range of molecular functions, and their functions involve binding interactions with partners and remaining native structural flexibility. The rapid increase in the number of proteins in sequence databases and the diversity of disordered functions challenge existing computational methods for predicting protein intrinsic disorder and disordered functions. A disordered region interacts with different partners to perform multiple functions, and these disordered functions exhibit different dependencies and correlations. In this study, we introduce DisoFLAG, a computational method that leverages a graph-based interaction protein language model (GiPLM) for jointly predicting disorder and its multiple potential functions. GiPLM integrates protein semantic information based on pre-trained protein language models into graph-based interaction units to enhance the correlation of the semantic representation of multiple disordered functions. The DisoFLAG predictor takes amino acid sequences as the only inputs and provides predictions of intrinsic disorder and six disordered functions for proteins, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linker. We evaluated the predictive performance of DisoFLAG following the Critical Assessment of protein Intrinsic Disorder (CAID) experiments, and the results demonstrated that DisoFLAG offers accurate and comprehensive predictions of disordered functions, extending the current coverage of computationally predicted disordered function categories. The standalone package and web server of DisoFLAG have been established to provide accurate prediction tools for intrinsic disorders and their associated functions.
Collapse
Affiliation(s)
- Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China.
| |
Collapse
|
18
|
Krishnan D, Babu S, Raju R, Veettil MV, Prasad TSK, Abhinand CS. Epstein-Barr Virus: Human Interactome Reveals New Molecular Insights into Viral Pathogenesis for Potential Therapeutics and Antiviral Drug Discovery. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:32-44. [PMID: 38190109 DOI: 10.1089/omi.2023.0241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Host-virus Protein-Protein Interactions (PPIs) play pivotal roles in biological processes crucial for viral pathogenesis and by extension, inform antiviral drug discovery and therapeutics innovations. Despite efforts to develop the Epstein-Barr virus (EBV)-host PPI network, there remain significant knowledge gaps and a limited number of interacting human proteins deciphered. Furthermore, understanding the dynamics of the EBV-host PPI network in the distinct lytic and latent viral stages remains elusive. In this study, we report a comprehensive map of the EBV-human protein interactions, encompassing 1752 human and 61 EBV proteins by integrating data from the public repository HPIDB (v3.0) as well as curated high-throughput proteomic data from the literature. To address the stage-specific nature of EBV infection, we generated two detailed subset networks representing the latent and lytic stages, comprising 747 and 481 human proteins, respectively. Functional and pathway enrichment analysis of these subsets uncovered the profound impact of EBV proteins on cancer. The identification of highly connected proteins and the characterization of intrinsically disordered and cancer-related proteins provide valuable insights into potential therapeutic targets. Moreover, the exploration of drug-protein interactions revealed notable associations between hub proteins and anticancer drugs, offering novel perspectives for controlling EBV pathogenesis. This study represents, to the best of our knowledge, the first comprehensive investigation of the two distinct stages of EBV infection using high-throughput datasets. This makes a contribution to our understanding of EBV-host interactions and provides a foundation for future drug discovery and therapeutic interventions.
Collapse
Affiliation(s)
- Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya Research Centre (YRC), Yenepoya (Deemed to be University), Mangalore, India
| | - Sreeranjini Babu
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya Research Centre (YRC), Yenepoya (Deemed to be University), Mangalore, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to be University), Mangalore, India
| | | | | | - Chandran S Abhinand
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya Research Centre (YRC), Yenepoya (Deemed to be University), Mangalore, India
| |
Collapse
|
19
|
Longfield SF, Mollazade M, Wallis TP, Gormal RS, Joensuu M, Wark JR, van Waardenberg AJ, Small C, Graham ME, Meunier FA, Martínez-Mármol R. Tau forms synaptic nano-biomolecular condensates controlling the dynamic clustering of recycling synaptic vesicles. Nat Commun 2023; 14:7277. [PMID: 37949856 PMCID: PMC10638352 DOI: 10.1038/s41467-023-43130-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 11/01/2023] [Indexed: 11/12/2023] Open
Abstract
Neuronal communication relies on the release of neurotransmitters from various populations of synaptic vesicles. Despite displaying vastly different release probabilities and mobilities, the reserve and recycling pool of vesicles co-exist within a single cluster suggesting that small synaptic biomolecular condensates could regulate their nanoscale distribution. Here, we performed a large-scale activity-dependent phosphoproteome analysis of hippocampal neurons in vitro and identified Tau as a highly phosphorylated and disordered candidate protein. Single-molecule super-resolution microscopy revealed that Tau undergoes liquid-liquid phase separation to generate presynaptic nanoclusters whose density and number are regulated by activity. This activity-dependent diffusion process allows Tau to translocate into the presynapse where it forms biomolecular condensates, to selectively control the mobility of recycling vesicles. Tau, therefore, forms presynaptic nano-biomolecular condensates that regulate the nanoscale organization of synaptic vesicles in an activity-dependent manner.
Collapse
Affiliation(s)
- Shanley F Longfield
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
| | - Mahdie Mollazade
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
| | - Tristan P Wallis
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
| | - Rachel S Gormal
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
| | - Merja Joensuu
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
| | - Jesse R Wark
- Synapse Proteomics, Children's Medical Research Institute (CMRI), The University of Sydney, 214 Hawkesbury Road, Westmead, NSW, 2145, Australia
| | | | - Christopher Small
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia
| | - Mark E Graham
- Synapse Proteomics, Children's Medical Research Institute (CMRI), The University of Sydney, 214 Hawkesbury Road, Westmead, NSW, 2145, Australia
| | - Frédéric A Meunier
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia.
- School of Biomedical Science, The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia.
| | - Ramón Martínez-Mármol
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland; St Lucia Campus, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
20
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
21
|
Jeffery CJ. Current successes and remaining challenges in protein function prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1222182. [PMID: 37576715 PMCID: PMC10415035 DOI: 10.3389/fbinf.2023.1222182] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 07/03/2023] [Indexed: 08/15/2023] Open
Abstract
In recent years, improvements in protein function prediction methods have led to increased success in annotating protein sequences. However, the functions of over 30% of protein-coding genes remain unknown for many sequenced genomes. Protein functions vary widely, from catalyzing chemical reactions to binding DNA or RNA or forming structures in the cell, and some types of functions are challenging to predict due to the physical features associated with those functions. Other complications in understanding protein functions arise due to the fact that many proteins have more than one function or very small differences in sequence or structure that correspond to different functions. We will discuss some of the recent developments in predicting protein functions and some of the remaining challenges.
Collapse
Affiliation(s)
- Constance J. Jeffery
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
22
|
Zhang Y, Liu X, Chen J. Re-Balancing Replica Exchange with Solute Tempering for Sampling Dynamic Protein Conformations. J Chem Theory Comput 2023; 19:1602-1614. [PMID: 36791464 PMCID: PMC10795075 DOI: 10.1021/acs.jctc.2c01139] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Replica exchange with solute tempering (REST) is a highly effective variant of replica exchange for enhanced sampling in explicit solvent simulations of biomolecules. By scaling the Hamiltonian for a selected "solute" region of the system, REST effectively applies tempering only to the degrees of freedom of interest but not the rest of the system ("solvent"), allowing fewer replicas for covering the same temperature range. A key consideration of REST is how the solute-solvent interactions are scaled together with the solute-solute interactions. Here, we critically evaluate the performance of the latest REST2 protocol for sampling large-scale conformation fluctuations of intrinsically disordered proteins (IDPs). The results show that REST2 promotes artificial protein conformational collapse at high effective temperatures, which seems to be a designed feature originally to promote the sampling of reversible folding of small proteins. The collapse is particularly severe with larger IDPs, leading to replica segregation in the effective temperature space and hindering effective sampling of large-scale conformational changes. We propose that the scaling of the solute-solvent interactions can be treated as free parameters in REST, which can be tuned to control the solute conformational properties (e.g., chain expansion) at different effective temperatures and achieve more effective sampling. To this end, we derive a new REST3 protocol, where the strengths of the solute-solvent van der Waals interactions are recalibrated to reproduce the levels of protein chain expansion at high effective temperatures. The efficiency of REST3 is examined using two IDPs with nontrivial local and long-range structural features, including the p53 N-terminal domain and the kinase inducible transactivation domain of transcription factor CREB. The results suggest that REST3 leads to a much more efficient temperature random walk and improved sampling efficiency, which also further reduces the number of replicas required. Nonetheless, our analysis also reveals significant challenges of relying on tempering alone for sampling large-scale conformational fluctuations of disordered proteins. It is likely that more efficient sampling protocols will require incorporating more sophisticated Hamiltonian replica exchange schemes in addition to tempering.
Collapse
Affiliation(s)
- Yumeng Zhang
- Department of Chemistry, University of Massachusetts, Amherst, MA 01003, USA
| | - Xiaorong Liu
- Corresponding Authors: (XL), (JC), Phone: (413) 545-3386 (JC)
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts, Amherst, MA 01003, USA
| |
Collapse
|
23
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
Background: The evolutionary rate of disordered proteins varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of intrinsically disordered regions (IDRs) across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards low complexity regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, low complexity proteins across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of low complexity, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
Affiliation(s)
- Christos E. Kouros
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vasiliki Makri
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| | - Anastasia Chasapi
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| |
Collapse
|
24
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 04/25/2023] Open
Abstract
Background: The evolutionary rate of disordered protein regions varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of compositional bias, indicative of disorder, across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards biased regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, proteins with compositional bias across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of compositional bias, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
Affiliation(s)
- Christos E. Kouros
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vasiliki Makri
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| | - Anastasia Chasapi
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| |
Collapse
|
25
|
Gingerich MA, Liu X, Chai B, Pearson GL, Vincent MP, Stromer T, Zhu J, Sidarala V, Renberg A, Sahu D, Klionsky DJ, Schnell S, Soleimanpour SA. An intrinsically disordered protein region encoded by the human disease gene CLEC16A regulates mitophagy. Autophagy 2023; 19:525-543. [PMID: 35604110 PMCID: PMC9851259 DOI: 10.1080/15548627.2022.2080383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
CLEC16A regulates mitochondrial health through mitophagy and is associated with over 20 human diseases. However, the key structural and functional regions of CLEC16A, and their relevance for human disease, remain unknown. Here, we report that a disease-associated CLEC16A variant lacks a C-terminal intrinsically disordered protein region (IDPR) that is critical for mitochondrial quality control. IDPRs comprise nearly half of the human proteome, yet their mechanistic roles in human disease are poorly understood. Using carbon detect NMR, we find that the CLEC16A C terminus lacks secondary structure, validating the presence of an IDPR. Loss of the CLEC16A C-terminal IDPR in vivo impairs mitophagy, mitochondrial function, and glucose-stimulated insulin secretion, ultimately causing glucose intolerance. Deletion of the CLEC16A C-terminal IDPR increases CLEC16A ubiquitination and degradation, thus impairing assembly of the mitophagy regulatory machinery. Importantly, CLEC16A stability is dependent on proline bias within the C-terminal IDPR, but not amino acid sequence order or charge. Together, we elucidate how an IDPR in CLEC16A regulates mitophagy and implicate pathogenic human gene variants that disrupt IDPRs as novel contributors to diabetes and other CLEC16A-associated diseases.Abbreviations : CAS: carbon-detect amino-acid specific; IDPR: intrinsically disordered protein region; MEFs: mouse embryonic fibroblasts; NMR: nuclear magnetic resonance.
Collapse
Affiliation(s)
- Morgan A. Gingerich
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA,Program in Cellular and Molecular Biology, University of Michigan, Ann Arbor, MI, USA
| | - Xueying Liu
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA,Department of Cardiology, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Biaoxin Chai
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA
| | - Gemma L. Pearson
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA
| | - Michael P. Vincent
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI, USA,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Tracy Stromer
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA
| | - Jie Zhu
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA
| | - Vaibhav Sidarala
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA
| | - Aaron Renberg
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA
| | - Debashish Sahu
- BioNMR Core Facility, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Daniel J. Klionsky
- Life Sciences Institute and Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA
| | - Santiago Schnell
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI, USA,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Scott A. Soleimanpour
- Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Ann Arbor, MI, USA,Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI, USA,Medicine Service, Endocrinology and Metabolism Section, VA Ann Arbor Health Care System, Ann Arbor, MI, USA,CONTACT Scott A. Soleimanpour Department of Internal Medicine and Division of Metabolism, Endocrinology & Diabetes, University of Michigan, Wall Street, Brehm Tower Room, Ann Arbor, MI, USA
| |
Collapse
|
26
|
Helmy M, Selvarajoo K. Application of GeneCloudOmics: Transcriptomic Data Analytics for Synthetic Biology. Methods Mol Biol 2023; 2553:221-263. [PMID: 36227547 DOI: 10.1007/978-1-0716-2617-7_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Research in synthetic biology and metabolic engineering require a deep understanding on the function and regulation of complex pathway genes. This can be achieved through gene expression profiling which quantifies the transcriptome-wide expression under any condition, such as a cell development stage, mutant, disease, or treatment with a drug. The expression profiling is usually done using high-throughput techniques such as RNA sequencing (RNA-Seq) or microarray. Although both methods are based on different technical approaches, they provide quantitative measures of the expression levels of thousands of genes. The expression levels of the genes are compared under different conditions to identify the differentially expressed genes (DEGs), the genes with different expression levels under different conditions. DEGs, usually involving thousands in number, are then investigated using bioinformatics and data analytic tools to infer and compare their functional roles between conditions. Dealing with such large datasets, therefore, requires intensive data processing and analyses to ensure its quality and produce results that are statistically sound. Thus, there is a need for deep statistical and bioinformatics knowledge to deal with high-throughput gene expression data. This represents a barrier for wet biologists with limited computational, programming, and data analytic skills that prevent them from getting the full potential of the data. In this chapter, we present a step-by-step protocol to perform transcriptome analysis using GeneCloudOmics, a cloud-based web server that provides an end-to-end platform for high-throughput gene expression analysis.
Collapse
Affiliation(s)
- Mohamed Helmy
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada.
| | - Kumar Selvarajoo
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Singapore Institute of Food and Biotechnology Innovation, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
27
|
Babaian A, Edgar R. Ribovirus classification by a polymerase barcode sequence. PeerJ 2022; 10:e14055. [PMID: 36258794 PMCID: PMC9573346 DOI: 10.7717/peerj.14055] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/24/2022] [Indexed: 01/19/2023] Open
Abstract
RNA viruses encoding a polymerase gene (riboviruses) dominate the known eukaryotic virome. High-throughput sequencing is revealing a wealth of new riboviruses known only from sequence, precluding classification by traditional taxonomic methods. Sequence classification is often based on polymerase sequences, but standardised methods to support this approach are currently lacking. To address this need, we describe the polymerase palmprint, a segment of the palm sub-domain robustly delineated by well-conserved catalytic motifs. We present an algorithm, Palmscan, which identifies palmprints in nucleotide and amino acid sequences; PALMdb, a collection of palmprints derived from public sequence databases; and palmID, a public website implementing palmprint identification, search, and annotation. Together, these methods demonstrate a proof-of-concept workflow for high-throughput characterisation of RNA viruses, paving the path for the continued rapid growth in RNA virus discovery anticipated in the coming decade.
Collapse
Affiliation(s)
- Artem Babaian
- St Edmunds College, Cambridge, United Kingdom
- Department of Haematology, University of Cambridge, Cambridge, United Kingdom
| | - Robert Edgar
- Corte Madera, California, United States of America
| |
Collapse
|
28
|
Roterman I, Stapor K, Fabian P, Konieczny L. New insights into disordered proteins and regions according to the FOD-M model. PLoS One 2022; 17:e0275300. [PMID: 36215254 PMCID: PMC9550084 DOI: 10.1371/journal.pone.0275300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 09/13/2022] [Indexed: 11/18/2022] Open
Abstract
A collection of intrinsically disordered proteins (IDPs) having regions with the status of intrinsically disordered (IDR) according to the Disprot database was analyzed from the point of view of the structure of hydrophobic core in the structural unit (chain / domain). The analysis includes all the Homo Sapiens as well as Mus Musculus proteins present in the DisProt database for which the structure is available. In the analysis, the fuzzy oil drop modified model (FOD-M) was used, taking into account the external force field, modified by the presence of other factors apart from polar water, influencing protein structuring. The paper presents an alternative to secondary-structure-based classification of intrinsically disordered regions (IDR). The basis of our classification is the ordering of hydrophobic core as calculated by the FOD-M model resulting in FOD-ordered or FOD-unordered IDRs.
Collapse
Affiliation(s)
- Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University, Medical College, Kraków, Poland
| | - Katarzyna Stapor
- Faculty of Automatic, Department of Applied Informatics, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Piotr Fabian
- Faculty of Automatic, Electronics and Computer Science, Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Jagiellonian University, Medical College, Kraków, Poland
| |
Collapse
|
29
|
Roesgaard MA, Lundsgaard JE, Newcombe EA, Jacobsen NL, Pesce F, Tranchant EE, Lindemose S, Prestel A, Hartmann-Petersen R, Lindorff-Larsen K, Kragelund BB. Deciphering the Alphabet of Disorder-Glu and Asp Act Differently on Local but Not Global Properties. Biomolecules 2022; 12:biom12101426. [PMID: 36291634 PMCID: PMC9599281 DOI: 10.3390/biom12101426] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/21/2022] [Accepted: 09/28/2022] [Indexed: 12/29/2022] Open
Abstract
Compared to folded proteins, the sequences of intrinsically disordered proteins (IDPs) are enriched in polar and charged amino acids. Glutamate is one of the most enriched amino acids in IDPs, while the chemically similar amino acid aspartate is less enriched. So far, the underlying functional differences between glutamates and aspartates in IDPs remain poorly understood. In this study, we examine the differential effects of aspartate and glutamates in IDPs by comparing the function and conformational ensemble of glutamate and aspartate variants of the disordered protein Dss1, using a range of assays, including interaction studies, nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and molecular dynamics simulation. First, we analyze the sequences of the rapidly growing database of experimentally verified IDPs (DisProt) and show that glutamate enrichment is not caused by a taxonomy bias in IDPs. From analyses of local and global structural properties as well as cell growth and protein-protein interactions using a model acidic IDP from yeast and three Glu/Asp variants, we find that while the Glu/Asp variants support similar function and global dimensions, the variants differ in their binding affinities and population of local transient structural elements. We speculate that these local structural differences may play roles in functional diversity, where glutamates can support increased helicity, important for folding and binding, while aspartates support extended structures and form helical caps, as well as playing more relevant roles in, e.g., transactivation domains and ion-binding.
Collapse
|
30
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
31
|
Horn JM, Zhu Y, Ahn SY, Obermeyer AC. Self-assembly of globular proteins with intrinsically disordered protein polyelectrolytes and block copolymers. SOFT MATTER 2022; 18:5759-5769. [PMID: 35912826 PMCID: PMC9446422 DOI: 10.1039/d2sm00415a] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Intrinsically disordered polypeptides are a versatile class of materials, combining the biocompatibility of peptides with the disordered structure and diverse phase behaviors of synthetic polymers. Synthetic polyelectrolytes are capable of complex phase behavior when mixed with oppositely charged polyelectrolytes, facilitating nanoparticle formation and bulk phase separation. However, there has been limited exploration of intrinsically disordered protein polyelectrolytes as potential bio-based replacements for synthetic polyelectrolytes. Here, we produce negatively charged, intrinsically disordered polypeptides, capable of high-yield expression in E. coli and use this intrinsically disordered peptide to produce entirely protein-based polyelectrolyte complexes. The complexes display rich phase behavior, showing sensitivity to charge density, salt concentration, temperature, and charge fraction. We characterize this behavior through a combination of turbidity assays, dynamic light scattering, and transmission electron microscopy. The robust expression profile and stimuli-responsive phase behavior of the intrinsically disordered peptides demonstrates their potential as easily producible, biocompatible substitutes for synthetic polyelectrolytes.
Collapse
Affiliation(s)
- Justin M Horn
- Department of Chemical Engineering, Columbia University, New York, NY 10027, USA.
| | - Yuncan Zhu
- Department of Chemical Engineering, Columbia University, New York, NY 10027, USA.
| | - So Yeon Ahn
- Department of Chemical Engineering, Columbia University, New York, NY 10027, USA.
| | - Allie C Obermeyer
- Department of Chemical Engineering, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
32
|
Roca-Martinez J, Lazar T, Gavalda-Garcia J, Bickel D, Pancsa R, Dixit B, Tzavella K, Ramasamy P, Sanchez-Fornaris M, Grau I, Vranken WF. Challenges in describing the conformation and dynamics of proteins with ambiguous behavior. Front Mol Biosci 2022; 9:959956. [PMID: 35992270 PMCID: PMC9382080 DOI: 10.3389/fmolb.2022.959956] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 06/27/2022] [Indexed: 11/13/2022] Open
Abstract
Traditionally, our understanding of how proteins operate and how evolution shapes them is based on two main data sources: the overall protein fold and the protein amino acid sequence. However, a significant part of the proteome shows highly dynamic and/or structurally ambiguous behavior, which cannot be correctly represented by the traditional fixed set of static coordinates. Representing such protein behaviors remains challenging and necessarily involves a complex interpretation of conformational states, including probabilistic descriptions. Relating protein dynamics and multiple conformations to their function as well as their physiological context (e.g., post-translational modifications and subcellular localization), therefore, remains elusive for much of the proteome, with studies to investigate the effect of protein dynamics relying heavily on computational models. We here investigate the possibility of delineating three classes of protein conformational behavior: order, disorder, and ambiguity. These definitions are explored based on three different datasets, using interpretable machine learning from a set of features, from AlphaFold2 to sequence-based predictions, to understand the overlap and differences between these datasets. This forms the basis for a discussion on the current limitations in describing the behavior of dynamic and ambiguous proteins.
Collapse
Affiliation(s)
- Joel Roca-Martinez
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Tamas Lazar
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Jose Gavalda-Garcia
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - David Bickel
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Rita Pancsa
- Research Centre for Natural Sciences, Institute of Enzymology, Budapest, Hungary
| | - Bhawna Dixit
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- IBiTech-Biommeda, Universiteit Gent, Gent, Belgium
| | - Konstantina Tzavella
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Pathmanaban Ramasamy
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, Universiteit Gent, Gent, Belgium
| | - Maite Sanchez-Fornaris
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Department of Computer Sciences, University of Camagüey, Camagüey, Cuba
| | - Isel Grau
- Information Systems, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Wim F. Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| |
Collapse
|
33
|
Ramasamy P, Vandermarliere E, Vranken WF, Martens L. Panoramic Perspective on Human Phosphosites. J Proteome Res 2022; 21:1894-1915. [PMID: 35793420 DOI: 10.1021/acs.jproteome.2c00164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein phosphorylation is the most common reversible post-translational modification of proteins and is key in the regulation of many cellular processes. Due to this importance, phosphorylation is extensively studied, resulting in the availability of a large amount of mass spectrometry-based phospho-proteomics data. Here, we leverage the information in these large-scale phospho-proteomics data sets, as contained in Scop3P, to analyze and characterize proteome-wide protein phosphorylation sites (P-sites). First, we set out to differentiate correctly observed P-sites from false-positive sites using five complementary site properties. We then describe the context of these P-sites in terms of the protein structure, solvent accessibility, structural transitions and disorder, and biophysical properties. We also investigate the relative prevalence of disease-linked mutations on and around P-sites. Moreover, we assess the structural dynamics of P-sites in their phosphorylated and unphosphorylated states. As a result, we show how large-scale reprocessing of available proteomics experiments can enable a more reliable view on proteome-wide P-sites. Furthermore, adding the structural context of proteins around P-sites helps uncover possible conformational switches upon phosphorylation. Moreover, by placing sites in different biophysical contexts, we show the differential preference in protein dynamics at phosphorylated sites when compared to the nonphosphorylated counterparts.
Collapse
Affiliation(s)
- Pathmanaban Ramasamy
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Centre for Structural Biology, VIB, 1050 Brussels, Belgium
| | | | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Centre for Structural Biology, VIB, 1050 Brussels, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
34
|
Quaglia F, Salladini E, Carraro M, Minervini G, Tosatto SCE, Le Mercier P. SARS-CoV-2 variants preferentially emerge at intrinsically disordered protein sites helping immune evasion. FEBS J 2022; 289:4240-4250. [PMID: 35108439 PMCID: PMC9542094 DOI: 10.1111/febs.16379] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 01/21/2022] [Accepted: 01/31/2022] [Indexed: 12/13/2022]
Abstract
The SARS‐CoV‐2 pandemic is maintained by the emergence of successive variants, highlighting the flexibility of the protein sequences of the virus. We show that experimentally determined intrinsically disordered regions (IDRs) are abundant in the SARS‐CoV‐2 viral proteins, making up to 28% of disorder content for the S1 subunit of spike and up to 51% for the nucleoprotein, with the vast majority of mutations occurring in the 13 major variants mapped to these IDRs. Strikingly, antigenic sites are enriched in IDRs, in the receptor‐binding domain (RBD) and in the N‐terminal domain (NTD), suggesting a key role of structural flexibility in the antigenicity of the SARS‐CoV‐2 protein surface. Mutations occurring in the S1 subunit and nucleoprotein (N) IDRs are critical for immune evasion and antibody escape, suggesting potential additional implications for vaccines and monoclonal therapeutic strategies. Overall, this suggests the presence of variable regions on S1 and N protein surfaces, which confer sequence and antigenic flexibility to the virus without altering its protein functions.
Collapse
Affiliation(s)
- Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy.,Department of Biomedical Sciences, University of Padova, Italy
| | | | - Marco Carraro
- Department of Biomedical Sciences, University of Padova, Italy
| | | | | | - Philippe Le Mercier
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
35
|
Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules 2022; 12:biom12070888. [PMID: 35883444 PMCID: PMC9313023 DOI: 10.3390/biom12070888] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/10/2022] [Accepted: 06/10/2022] [Indexed: 11/17/2022] Open
Abstract
Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes.
Collapse
|
36
|
Konkankit C, Rackovsky S. The dynamic basis of structural order in proteins. Proteins 2022; 90:1115-1118. [PMID: 34981860 PMCID: PMC9007817 DOI: 10.1002/prot.26296] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/26/2021] [Accepted: 12/29/2021] [Indexed: 01/21/2023]
Abstract
We compare the sequences of folded and intrinsically disordered proteins (IDPs), using bioinformatic methods recently developed to study protein dynamic properties. We demonstrate that the two classes of sequences are organized in diametrically opposite ways with respect to long-length-scale dynamic properties. We further demonstrate a statistically significant difference between the amino acid compositions of folded and disordered proteins, which is expressed in dynamic properties. Our results indicate that the long-length-scale properties of sequences are critical in determining whether proteins are able to fold, and, more generally, that they are central to an understanding of protein physics. They further provide a physical basis for the empirically observed differences in amino acid composition between folded and IDPs.
Collapse
Affiliation(s)
- Chilaluck Konkankit
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York, USA
| | - S Rackovsky
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York, USA.,Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| |
Collapse
|
37
|
Wilson CJ, Choy WY, Karttunen M. AlphaFold2: A Role for Disordered Protein/Region Prediction? Int J Mol Sci 2022; 23:4591. [PMID: 35562983 PMCID: PMC9104326 DOI: 10.3390/ijms23094591] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 04/18/2022] [Accepted: 04/19/2022] [Indexed: 01/27/2023] Open
Abstract
The development of AlphaFold2 marked a paradigm-shift in the structural biology community. Herein, we assess the ability of AlphaFold2 to predict disordered regions against traditional sequence-based disorder predictors. We find that AlphaFold2 performs well at discriminating disordered regions, but also note that the disorder predictor one constructs from an AlphaFold2 structure determines accuracy. In particular, a naïve, but non-trivial assumption that residues assigned to helices, strands, and H-bond stabilized turns are likely ordered and all other residues are disordered results in a dramatic overestimation in disorder; conversely, the predicted local distance difference test (pLDDT) provides an excellent measure of residue-wise disorder. Furthermore, by employing molecular dynamics (MD) simulations, we note an interesting relationship between the pLDDT and secondary structure, that may explain our observations and suggests a broader application of the pLDDT for characterizing the local dynamics of intrinsically disordered proteins and regions (IDPs/IDRs).
Collapse
Affiliation(s)
- Carter J. Wilson
- Department of Mathematics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada;
- Centre for Advanced Materials and Biomaterials Research, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
| | - Wing-Yiu Choy
- Department of Biochemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5C1, Canada
| | - Mikko Karttunen
- Centre for Advanced Materials and Biomaterials Research, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
- Department of Physics and Astronomy, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
- Department of Chemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 3K7, Canada
| |
Collapse
|
38
|
Abstract
In-cell structural biology aims at extracting structural information about proteins or nucleic acids in their native, cellular environment. This emerging field holds great promise and is already providing new facts and outlooks of interest at both fundamental and applied levels. NMR spectroscopy has important contributions on this stage: It brings information on a broad variety of nuclei at the atomic scale, which ensures its great versatility and uniqueness. Here, we detail the methods, the fundamental knowledge, and the applications in biomedical engineering related to in-cell structural biology by NMR. We finally propose a brief overview of the main other techniques in the field (EPR, smFRET, cryo-ET, etc.) to draw some advisable developments for in-cell NMR. In the era of large-scale screenings and deep learning, both accurate and qualitative experimental evidence are as essential as ever to understand the interior life of cells. In-cell structural biology by NMR spectroscopy can generate such a knowledge, and it does so at the atomic scale. This review is meant to deliver comprehensive but accessible information, with advanced technical details and reflections on the methods, the nature of the results, and the future of the field.
Collapse
Affiliation(s)
- Francois-Xavier Theillet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
39
|
Li H, Pang Y, Liu B, Yu L. MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning. Front Pharmacol 2022; 13:856417. [PMID: 35350759 PMCID: PMC8957949 DOI: 10.3389/fphar.2022.856417] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 02/14/2022] [Indexed: 01/13/2023] Open
Abstract
Intrinsically disordered regions (IDRs) without stable structure are important for protein structures and functions. Some IDRs can be combined with molecular fragments to make itself completed the transition from disordered to ordered, which are called molecular recognition features (MoRFs). There are five main functions of MoRFs: molecular recognition assembler (MoR_assembler), molecular recognition chaperone (MoR_chaperone), molecular recognition display sites (MoR_display_sites), molecular recognition effector (MoR_effector), and molecular recognition scavenger (MoR_scavenger). Researches on functions of molecular recognition features are important for pharmaceutical and disease pathogenesis. However, the existing computational methods can only predict the MoRFs in proteins, failing to distinguish their different functions. In this paper, we treat MoRF function prediction as a multi-label learning task and solve it with the Binary Relevance (BR) strategy. Finally, we use Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF) as basic models to construct MoRF-FUNCpred through ensemble learning. Experimental results show that MoRF-FUNCpred performs well for MoRF function prediction. To the best knowledge of ours, MoRF-FUNCpred is the first predictor for predicting the functions of MoRFs. Availability and Implementation: The stand alone package of MoRF-FUNCpred can be accessed from https://github.com/LiangYu-Xidian/MoRF-FUNCpred.
Collapse
Affiliation(s)
- Haozheng Li
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
40
|
Secretory quality control constrains functional selection-associated protein structure innovation. Commun Biol 2022; 5:268. [PMID: 35338247 PMCID: PMC8956723 DOI: 10.1038/s42003-022-03220-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 03/03/2022] [Indexed: 12/26/2022] Open
Abstract
Biophysical models suggest a dominant role of structural over functional constraints in shaping protein evolution. Selection on structural constraints is linked closely to expression levels of proteins, which together with structure-associated activities determine in vivo functions of proteins. Here we show that despite the up to two orders of magnitude differences in levels of C-reactive protein (CRP) in distinct species, the in vivo functions of CRP are paradoxically conserved. Such a pronounced level-function mismatch cannot be explained by activities associated with the conserved native structure, but is coupled to hidden activities associated with the unfolded, activated conformation. This is not the result of selection on structural constraints like foldability and stability, but is achieved by folding determinants-mediated functional selection that keeps a confined carrier structure to pass the stringent eukaryotic quality control on secretion. Further analysis suggests a folding threshold model which may partly explain the mismatch between the vast sequence space and the limited structure space of proteins. The mismatch in the conserved structure but different expression levels of C-reactive protein (CRP) in distinct species is reconciled by functional selection on hidden activities of unfolded CRPs.
Collapse
|
41
|
Zhao J, Wang Z. Identifying Intrinsically Disordered Protein Regions through a Deep Neural Network with Three Novel Sequence Features. Life (Basel) 2022; 12:life12030345. [PMID: 35330096 PMCID: PMC8950681 DOI: 10.3390/life12030345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 02/22/2022] [Accepted: 02/23/2022] [Indexed: 11/26/2022] Open
Abstract
The fast, reliable, and accurate identification of IDPRs is essential, as in recent years it has come to be recognized more and more that IDPRs have a wide impact on many important physiological processes, such as molecular recognition and molecular assembly, the regulation of transcription and translation, protein phosphorylation, cellular signal transduction, etc. For the sake of cost-effectiveness, it is imperative to develop computational approaches for identifying IDPRs. In this study, a deep neural structure where a variant VGG19 is situated between two MLP networks is developed for identifying IDPRs. Furthermore, for the first time, three novel sequence features—i.e., persistent entropy and the probabilities associated with two and three consecutive amino acids of the protein sequence—are introduced for identifying IDPRs. The simulation results show that our neural structure either performs considerably better than other known methods or, when relying on a much smaller training set, attains a similar performance. Our deep neural structure, which exploits the VGG19 structure, is effective for identifying IDPRs. Furthermore, three novel sequence features—i.e., the persistent entropy and the probabilities associated with two and three consecutive amino acids of the protein sequence—could be used as valuable sequence features in the further development of identifying IDPRs.
Collapse
|
42
|
Muronetz VI, Pozdyshev DV, Medvedeva MV, Sevostyanova IA. Potential Effect of Post-Transcriptional Substitutions of Tyrosine for Cysteine Residues on Transformation of Amyloidogenic Proteins. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:170-178. [PMID: 35508908 DOI: 10.1134/s0006297922020080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/12/2022] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
The review considers the reasons and consequences of post-transcriptional tyrosine substitutions for cysteine residues. Main attention is paid to the Tyr/Cys substitutions that arise during gene expression in bacterial systems at the stage of protein translation as a result of misrecognition of the similar mRNA codons. Notably, translation errors generally occur relatively rarely - from 10-4 to 10-3 errors per codon for E. coli cells, but in some cases the error rate increases significantly. For example, this is typical for certain pairs of codons, when the culture conditions change or in the presence of antibiotics. Thus, with overproduction of the recombinant human alpha-synuclein in E. coli cells, the content of the mutant form with the replacement of Tyr136 (UAC codon) with a cysteine residue (UGC codon) can reach 50%. Possible reasons for the increased production of alpha-synuclein with the Tyr136Cys substitution are considered, as well as consequences of the presence of mutant forms in preparations of amyloidogenic proteins when studying their pathological transformation in vitro. A separate section is devoted to the Tyr/Cys substitutions occurring due to mRNA editing by adenosine deaminases, which is typical for eukaryotic organisms, and the possible role of this process in the amyloid transformation of proteins associated with neurodegenerative diseases.
Collapse
Affiliation(s)
- Vladimir I Muronetz
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia.
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Denis V Pozdyshev
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Maria V Medvedeva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Irina A Sevostyanova
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| |
Collapse
|
43
|
Kjaergaard M. Estimation of Effective Concentrations Enforced by Complex Linker Architectures from Conformational Ensembles. Biochemistry 2022; 61:171-182. [DOI: 10.1021/acs.biochem.1c00737] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Magnus Kjaergaard
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus 8000, Denmark
- The Danish Research Institute for Translational Neuroscience (DANDRITE), Nordic EMBL Partnership for Molecular Medicine, Aarhus University, Aarhus 8000, Denmark
- Center for Proteins in Memory─PROMEMO, Danish National Research Foundation, Aarhus University, Aarhus 8000, Denmark
| |
Collapse
|
44
|
de Brevern AG, Rebehmed J. Current status of PTMs structural databases: applications, limitations and prospects. Amino Acids 2022; 54:575-590. [PMID: 35020020 DOI: 10.1007/s00726-021-03119-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 12/11/2022]
Abstract
Protein 3D structures, determined by their amino acid sequences, are the support of major crucial biological functions. Post-translational modifications (PTMs) play an essential role in regulating these functions by altering the physicochemical properties of proteins. By virtue of their importance, several PTM databases have been developed and released in decades, but very few of these databases incorporate real 3D structural data. Since PTMs influence the function of the protein and their aberrant states are frequently implicated in human diseases, providing structural insights to understand the influence and dynamics of PTMs is crucial for unraveling the underlying processes. This review is dedicated to the current status of databases providing 3D structural data on PTM sites in proteins. Some of these databases are general, covering multiple types of PTMs in different organisms, while others are specific to one particular type of PTM, class of proteins or organism. The importance of these databases is illustrated with two major types of in silico applications: predicting PTM sites in proteins using machine learning approaches and investigating protein structure-function relationships involving PTMs. Finally, these databases suffer from multiple problems and care must be taken when analyzing the PTMs data.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université de Paris, INSERM, UMR_S 1134, DSIMB, 75739, Paris, France.,Université de la Réunion, INSERM, UMR_S 1134, DSIMB, 97715, Saint-Denis de La Réunion, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France
| | - Joseph Rebehmed
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon.
| |
Collapse
|
45
|
Nadendla S, Jackson R, Munro J, Quaglia F, Mészáros B, Olley D, Hobbs ET, Goralski SM, Chibucos M, Mungall CJ, Tosatto SCE, Erill I, Giglio MG. ECO: the Evidence and Conclusion Ontology, an update for 2022. Nucleic Acids Res 2022; 50:D1515-D1521. [PMID: 34986598 PMCID: PMC8728134 DOI: 10.1093/nar/gkab1025] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/12/2021] [Accepted: 10/18/2021] [Indexed: 11/12/2022] Open
Abstract
The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.
Collapse
Affiliation(s)
- Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - James Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy.,Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Dustin Olley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Marcus Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Christopher John Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, USA
| | | | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Michelle G Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
46
|
Monzon AM, Piovesan D, Fuxreiter M. Molecular Determinants of Selectivity in Disordered Complexes May Shed Light on Specificity in Protein Condensates. Biomolecules 2022; 12:biom12010092. [PMID: 35053240 PMCID: PMC8773858 DOI: 10.3390/biom12010092] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 12/22/2021] [Accepted: 12/25/2021] [Indexed: 02/01/2023] Open
Abstract
Biomolecular condensates challenge the classical concepts of molecular recognition. The variable composition and heterogeneous conformations of liquid-like protein droplets are bottlenecks for high-resolution structural studies. To obtain atomistic insights into the organization of these assemblies, here we have characterized the conformational ensembles of specific disordered complexes, including those of droplet-driving proteins. First, we found that these specific complexes exhibit a high degree of conformational heterogeneity. Second, we found that residues forming contacts at the interface also sample many conformations. Third, we found that different patterns of contacting residues form the specific interface. In addition, we observed a wide range of sequence motifs mediating disordered interactions, including charged, hydrophobic and polar contacts. These results demonstrate that selective recognition can be realized by variable patterns of weakly defined interaction motifs in many different binding configurations. We propose that these principles also play roles in determining the selectivity of biomolecular condensates.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Department of Biomedical Sciences, University of Padova, 35131 Padova, Italy; (A.M.M.); (D.P.)
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, 35131 Padova, Italy; (A.M.M.); (D.P.)
| | - Monika Fuxreiter
- Department of Biomedical Sciences, University of Padova, 35131 Padova, Italy; (A.M.M.); (D.P.)
- Department of Biochemistry and Molecular Biology, University of Debrecen, 4032 Debrecen, Hungary
- Correspondence:
| |
Collapse
|
47
|
Tamburrini KC, Pesce G, Nilsson J, Gondelaud F, Kajava AV, Berrin JG, Longhi S. Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol 2022; 2449:95-147. [PMID: 35507260 DOI: 10.1007/978-1-0716-2095-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the last two decades it has become increasingly evident that a large number of proteins adopt either a fully or a partially disordered conformation. Intrinsically disordered proteins are ubiquitous proteins that fulfill essential biological functions while lacking a stable 3D structure. Their conformational heterogeneity is encoded by the amino acid sequence, thereby allowing intrinsically disordered proteins or regions to be recognized based on their sequence properties. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting protein disorder and identifying intrinsically disordered binding sites.
Collapse
Affiliation(s)
- Ketty C Tamburrini
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Giulia Pesce
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Juliet Nilsson
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Frank Gondelaud
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Université Montpellier, Montpellier, France
| | - Jean-Guy Berrin
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Sonia Longhi
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France.
| |
Collapse
|
48
|
Dependence of Protein Structure on Environment: FOD Model Applied to Membrane Proteins. MEMBRANES 2021; 12:membranes12010050. [PMID: 35054576 PMCID: PMC8778870 DOI: 10.3390/membranes12010050] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 12/13/2021] [Accepted: 12/28/2021] [Indexed: 11/17/2022]
Abstract
The natural environment of proteins is the polar aquatic environment and the hydrophobic (amphipathic) environment of the membrane. The fuzzy oil drop model (FOD) used to characterize water-soluble proteins, as well as its modified version FOD-M, enables a mathematical description of the presence and influence of diverse environments on protein structure. The present work characterized the structures of membrane proteins, including those that act as channels, and a water-soluble protein for contrast. The purpose of the analysis was to verify the possibility that an external force field can be used in the simulation of the protein-folding process, taking into account the diverse nature of the environment that guarantees a structure showing biological activity.
Collapse
|
49
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
50
|
Helmy M, Agrawal R, Ali J, Soudy M, Bui TT, Selvarajoo K. GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis. FRONTIERS IN BIOINFORMATICS 2021; 1:693836. [PMID: 36303746 PMCID: PMC9581002 DOI: 10.3389/fbinf.2021.693836] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiling techniques, such as DNA microarray and RNA-Sequencing, have provided significant impact on our understanding of biological systems. They contribute to almost all aspects of biomedical research, including studying developmental biology, host-parasite relationships, disease progression and drug effects. However, the high-throughput data generations present challenges for many wet experimentalists to analyze and take full advantage of such rich and complex data. Here we present GeneCloudOmics, an easy-to-use web server for high-throughput gene expression analysis that extends the functionality of our previous ABioTrans with several new tools, including protein datasets analysis, and a web interface. GeneCloudOmics allows both microarray and RNA-Seq data analysis with a comprehensive range of data analytics tools in one package that no other current standalone software or web-based tool can do. In total, GeneCloudOmics provides the user access to 23 different data analytical and bioinformatics tasks including reads normalization, scatter plots, linear/non-linear correlations, PCA, clustering (hierarchical, k-means, t-SNE, SOM), differential expression analyses, pathway enrichments, evolutionary analyses, pathological analyses, and protein-protein interaction (PPI) identifications. Furthermore, GeneCloudOmics allows the direct import of gene expression data from the NCBI Gene Expression Omnibus database. The user can perform all tasks rapidly through an intuitive graphical user interface that overcomes the hassle of coding, installing tools/packages/libraries and dealing with operating systems compatibility and version issues, complications that make data analysis tasks challenging for biologists. Thus, GeneCloudOmics is a one-stop open-source tool for gene expression data analysis and visualization. It is freely available at http://combio-sifbi.org/GeneCloudOmics.
Collapse
Affiliation(s)
- Mohamed Helmy
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
| | - Rahul Agrawal
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Javed Ali
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Mohamed Soudy
- Proteomics and Metabolomics Unit, Children Cancer Hospital (CCHE-57357), Cairo, Egypt
| | - Thuy Tien Bui
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Kumar Selvarajoo
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore (NUS), Singapore, Singapore
- *Correspondence: Kumar Selvarajoo,
| |
Collapse
|