1
|
Willis Chow CF, Scheremetjew M, Moon H, Ghosh S, Hadarovich A, Hersemann L, Toth-Petroczy A. SHARK: web server for alignment-free homology assessment for intrinsically disordered and unalignable protein regions. Nucleic Acids Res 2025:gkaf408. [PMID: 40396357 DOI: 10.1093/nar/gkaf408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 04/09/2025] [Accepted: 05/02/2025] [Indexed: 05/22/2025] Open
Abstract
Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web server implementation of SHARK (bio-shark.org), an alignment-free algorithm for homology classification that compares the overall amino acid composition and short regions (k-mers) shared between sequences (SHARK-scores). The output of such k-mer-based comparisons is used by SHARK-dive, a machine learning classifier to detect homology between unalignable, disordered sequences. SHARK-web provides sequence-versus-database assessment of protein sequence homology akin to conventional tools such as BLAST and HMMER. Additionally, we provide precomputed sets of IDR sequences from 16 model organism proteomes facilitating searches against species-specific IDR-omes. SHARK-dive offers superior overall homology detection performance to BLAST and HMMER, driven by a large increase in sensitivity to low sequence identity homologs, and can be used to facilitate the study of sequence-function relationships in disordered, difficult-to-align regions.
Collapse
Affiliation(s)
- Chi Fung Willis Chow
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
| | - Maxim Scheremetjew
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - HongKee Moon
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Soumyadeep Ghosh
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Anna Hadarovich
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Lena Hersemann
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
| |
Collapse
|
2
|
Han KS, Kim HK, Kim MH, Pak MH, Pak SJ, Choe MM, Kim CS. PredIDR2: Improving accuracy of protein intrinsic disorder prediction by updating deep convolutional neural network and supplementing DisProt data. Int J Biol Macromol 2025; 306:141801. [PMID: 40054813 DOI: 10.1016/j.ijbiomac.2025.141801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 03/03/2025] [Accepted: 03/04/2025] [Indexed: 05/11/2025]
Abstract
Intrinsically disordered proteins (IDPs) or regions (IDRs) are widespread in proteomes, and involved in several important biological processes and implicated in many diseases. Many computational methods for IDR prediction are being developed to decrease the gap between the low speed of experimental determination of annotated proteins and the rapid increase of non-annotated proteins, and their performances are blindly tested by the community-driven experiment, the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we developed PredIDR2 series, an updated version of PredIDR tested in CAID2 in order to accurately predict intrinsically disordered regions from protein sequence. It includes four methods depending on the input features and the producing mode of the negative samples of the training set. PredIDR2 series (AUC_ROC = 0.952) perform remarkably better than our previous PredIDR (AUC_ROC = 0.933) for Disorder-PDB dataset of CAID2, which seems to be mainly attributed to the introduction of a new deep convolutional neural network and the augmentation of the training data, especially from DisProt database. PredIDR2 series outperform the state-of-the-art IDR prediction methods participated in CAID2 in terms of AUC_ROC, AUC_PR and DC_mae and belong to the seven top-performing methods in terms of MCC. PredIDR2 series can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
Collapse
Affiliation(s)
- Kun-Sop Han
- University of Sciences, Pyongyang, Democratic People's Republic of Korea.
| | - Ha-Kyong Kim
- Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyok Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyon Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Song-Jin Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Mun-Myong Choe
- University of Science and Technology, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Song Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| |
Collapse
|
3
|
Rahman MM, Zamakhaeva S, Rush JS, Chaton CT, Kenner CW, Hla YM, Tsui HCT, Uversky VN, Winkler ME, Korotkov KV, Korotkova N. Glycosylation of serine/threonine-rich intrinsically disordered regions of membrane-associated proteins in streptococci. Nat Commun 2025; 16:4011. [PMID: 40301326 PMCID: PMC12041528 DOI: 10.1038/s41467-025-58692-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 03/31/2025] [Indexed: 05/01/2025] Open
Abstract
Proteins harboring intrinsically disordered regions (IDRs) lacking stable secondary or tertiary structures are abundant across the three domains of life. These regions have not been systematically studied in prokaryotes. Here, our genome-wide analysis identifies extracytoplasmic serine/threonine-rich IDRs in several biologically important membrane-associated proteins in streptococci. We demonstrate that these IDRs are glycosylated with glucose by glycosyltransferases GtrB and PgtC2 in Streptococcus pyogenes and Streptococcus pneumoniae, and with N-acetylgalactosamine by a Pgf-dependent mechanism in Streptococcus mutans. The absence of glycosylation leads to a defect in biofilm formation under ethanol-stressed conditions in S. mutans. We link this phenotype to the C-terminal IDR of the post-translocation chaperone PrsA. Our data reveal that O-linked glycosylation protects the IDR-containing proteins from proteolytic degradation and is critical for the biological function of PrsA in biofilm formation.
Collapse
Affiliation(s)
- Mohammad M Rahman
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
| | - Svetlana Zamakhaeva
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
| | - Jeffrey S Rush
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| | - Catherine T Chaton
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| | - Cameron W Kenner
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
| | - Yin Mon Hla
- Department of Biology, Indiana University Bloomington, Bloomington, IN, USA
| | | | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Malcolm E Winkler
- Department of Biology, Indiana University Bloomington, Bloomington, IN, USA
| | - Konstantin V Korotkov
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| | - Natalia Korotkova
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA.
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA.
| |
Collapse
|
4
|
Shukla S, Lastorka SS, Uversky VN. Intrinsic Disorder and Phase Separation Coordinate Exocytosis, Motility, and Chromatin Remodeling in the Human Acrosomal Proteome. Proteomes 2025; 13:16. [PMID: 40407495 PMCID: PMC12101322 DOI: 10.3390/proteomes13020016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 04/23/2025] [Accepted: 04/25/2025] [Indexed: 05/26/2025] Open
Abstract
Intrinsic disorder refers to protein regions that lack a fixed three-dimensional structure under physiological conditions, enabling conformational plasticity. This flexibility allows for diverse functions, including transient interactions, signaling, and phase separation via disorder-to-order transitions upon binding. Our study focused on investigating the role of intrinsic disorder and liquid-liquid phase separation (LLPS) in the human acrosome, a sperm-specific organelle essential for fertilization. Using computational prediction models, network analysis, Structural Classification of Proteins (SCOP) functional assessments, and Gene Ontology, we analyzed 250 proteins within the acrosomal proteome. Our bioinformatic analysis yielded 97 proteins with high levels (>30%) of structural disorder. Further analysis of functional enrichment identified associations between disordered regions overlapping with SCOP domains and critical acrosomal processes, including vesicle trafficking, membrane fusion, and enzymatic activation. Examples of disordered SCOP domains include the PLC-like phosphodiesterase domain, the t-SNARE domain, and the P-domain of calnexin/calreticulin. Protein-protein interaction networks revealed acrosomal proteins as hubs in tightly interconnected systems, emphasizing their functional importance. LLPS propensity modeling determined that over 30% of these proteins are high-probability LLPS drivers (>60%), underscoring their role in dynamic compartmentalization. Proteins such as myristoylated alanine-rich C-kinase substrate and nuclear transition protein 2 exhibited both high LLPS propensities and high levels of structural disorder. A significant relationship (p < 0.0001, R² = 0.649) was observed between the level of intrinsic disorder and LLPS propensity, showing the role of disorder in facilitating phase separation. Overall, these findings provide insights into how intrinsic disorder and LLPS contribute to the structural adaptability and functional precision required for fertilization, with implications for understanding disorders associated with the human acrosome reaction.
Collapse
Affiliation(s)
- Shivam Shukla
- Department of Integrative Biology, College of Arts and Sciences, University of South Florida-St. Petersburg, 140 7th Ave. South, St. Petersburg, FL 33701, USA;
| | - Sean S. Lastorka
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
| | - Vladimir N. Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| |
Collapse
|
5
|
Vangala VNP, Uversky VN. Intrinsic disorder in protein interaction networks linking cancer with metabolic diseases. Comput Biol Chem 2025; 118:108493. [PMID: 40319601 DOI: 10.1016/j.compbiolchem.2025.108493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 04/20/2025] [Accepted: 04/24/2025] [Indexed: 05/07/2025]
Abstract
Complex diseases are usually driven by numerous proteins that operate as intricate network systems. Deciphering of their signals is required for more advanced level understanding of the cellular processes driven by protein interactions. Therefore, placing diseases into a framework, where they can be viewed together, represents an important and fruitful approach. The goal of this study was to assess the intrinsic disorder present in the proteins forming PPI networks linking cancer with different human diseases. To this end, we used a set of bioinformatics tools to explore intrinsic disorder and liquid-liquid phase separation predispositions of 340 proteins reported earlier by Hirsch et al. (Cancer Cell (2010) 17(4), 348-361; doi: 10.1016/j.ccr.2010.01.022) as differently expressed in common chronic diseases, such as cancer, obesity, diabetes, and cardiovascular diseases. We further examined selected proteins from this set for their interactability and intrinsic disorder-based functionality. Our analyses demonstrated that intrinsically disordered proteins and proteins with intrinsically disordered regions may act as active network promoters and operate as highly connected hubs, which further enables them to play key roles in the disease pathways. The study also indicated that differently expressed proteins involved in disease progression could be characterized by diverse degrees of intrinsic disorder and LLPS propensity. We hope that our findings in combination with the outputs of the proteomic and functional genomic analyses contain essential clues that can be used in further medical research leading to the design of better therapies.
Collapse
Affiliation(s)
- Veda Naga Priya Vangala
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| |
Collapse
|
6
|
Villanueva RA, Loyola A. The Intrinsically Disordered Region of HBx and Virus-Host Interactions: Uncovering New Therapeutic Approaches for HBV and Cancer. Int J Mol Sci 2025; 26:3552. [PMID: 40332052 PMCID: PMC12026620 DOI: 10.3390/ijms26083552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2025] [Revised: 04/02/2025] [Accepted: 04/07/2025] [Indexed: 05/08/2025] Open
Abstract
Human viral infections remain a significant global health challenge, contributing to a substantial number of cancer cases worldwide. Among them, infections with oncoviruses such as hepatitis B virus (HBV) and hepatitis C virus (HCV) are key drivers of hepatocellular carcinoma (HCC). Despite the availability of an effective HBV vaccine since the 1980s, millions remain chronically infected due to the persistence of covalently closed circular DNA (cccDNA) as a reservoir in hepatocytes. Current antiviral therapies, including nucleos(t)ide analogs and interferon, effectively suppress viral replication but fail to eliminate cccDNA, underscoring the urgent need for innovative therapeutic strategies. Direct-acting antiviral agents (DAAs), which have revolutionized HCV treatment with high cure rates, offer a promising model for HBV therapy. A particularly attractive target is the intrinsically disordered region (IDR) of the HBx protein, which regulates cccDNA transcription, viral replication, and oncogenesis by interacting with key host proteins. DAAs targeting these interactions could inhibit viral persistence, suppress oncogenic signaling, and overcome treatment resistance. This review highlights the potential of HBx-directed DAAs to complement existing therapies, offering renewed hope for a functional HBV cure and reduced cancer risk.
Collapse
Affiliation(s)
- Rodrigo A. Villanueva
- Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago 8580702, Chile
| | - Alejandra Loyola
- Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago 8580702, Chile
- Facultad de Ciencias, Universidad San Sebastián, Santiago 7510602, Chile
| |
Collapse
|
7
|
Rahman MM, Zamakhaeva S, Rush JS, Chaton CT, Kenner CW, Hla YM, Tsui HCT, Uversky VN, Winkler ME, Korotkov KV, Korotkova N. Glycosylation of serine/threonine-rich intrinsically disordered regions of membrane-associated proteins in streptococci. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.05.05.592596. [PMID: 38746434 PMCID: PMC11092751 DOI: 10.1101/2024.05.05.592596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Proteins harboring intrinsically disordered regions (IDRs) lacking stable secondary or tertiary structures are abundant across the three domains of life. These regions have not been systematically studied in prokaryotes. Our genome-wide analysis identifies extracytoplasmic serine/threonine-rich IDRs in several biologically important membrane-associated proteins in streptococci. We demonstrate that these IDRs are glycosylated with glucose by glycosyltransferases GtrB and PgtC2 in Streptococcus pyogenes and Streptococcus pneumoniae, and with N-acetylgalactosamine by a Pgf-dependent mechanism in Streptococcus mutans. The absence of glycosylation leads to a defect in biofilm formation under ethanol-stressed conditions in S. mutans. We link this phenotype to the C-terminal IDR of the post-translocation chaperone PrsA. Our data reveal that O-linked glycosylation protects the IDR-containing proteins from proteolytic degradation and is critical for the biological function of PrsA in biofilm formation.
Collapse
Affiliation(s)
- Mohammad M. Rahman
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
| | - Svetlana Zamakhaeva
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
| | - Jeffrey S. Rush
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| | - Catherine T. Chaton
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| | - Cameron W. Kenner
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
| | - Yin Mon Hla
- Department of Biology, Indiana University Bloomington, Bloomington, Indiana, USA
| | | | - Vladimir N. Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Malcolm E. Winkler
- Department of Biology, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Konstantin V. Korotkov
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| | - Natalia Korotkova
- Department of Microbiology, Immunology and Molecular Genetics, University of Kentucky, Lexington, Kentucky, USA
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
8
|
Mughal F, Caetano-Anollés G. Evolution of intrinsic disorder in the structural domains of viral and cellular proteomes. Sci Rep 2025; 15:2878. [PMID: 39843714 PMCID: PMC11754631 DOI: 10.1038/s41598-025-86045-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/07/2025] [Indexed: 01/24/2025] Open
Abstract
Intrinsically disordered regions are flexible regions that complement the typical structured regions of proteins. Little is known however about their evolution. Here we leverage a comparative and evolutionary genomics approach to analyze intrinsic disorder in the structural domains of thousands of proteomes. Our analysis revealed that viral and cellular proteomes employ similar strategies to increase disorder but achieve different goals. Viral proteomes evolve disorder for economy of genomic material and multifunctionality. On the other hand, cellular proteomes evolve disorder to advance functionality with increasing genomic complexity. Remarkably, phylogenomic analysis of intrinsic disorder showed that ancient domains were ordered and that disorder evolved as a benefit acquired later in evolution. Evolutionary chronologies of domains indexed with disorder levels and distributions across Archaea, Bacteria, Eukarya and viruses revealed six evolutionary phases, the oldest two harboring only ordered and moderate disorder domains. A biphasic spectrum of disorder versus proteome makeup captured the dichotomy in the evolutionary trajectories of viral and cellular ancestors, one following reductive evolution driven by viral spread of molecular wealth and the other following expansive evolutionary trends to advance functionality through massive domain-forming co-option of disordered loop regions.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
9
|
Han KS, Song SR, Pak MH, Kim CS, Ri CP, Del Conte A, Piovesan D. PredIDR: Accurate prediction of protein intrinsic disorder regions using deep convolutional neural network. Int J Biol Macromol 2025; 284:137665. [PMID: 39571839 DOI: 10.1016/j.ijbiomac.2024.137665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/29/2024] [Accepted: 11/13/2024] [Indexed: 12/02/2024]
Abstract
The involvement of protein intrinsic disorder in essential biological processes, it is well known in structural biology. However, experimental methods for detecting intrinsic structural disorder and directly measuring highly dynamic behavior of protein structure are limited. To address this issue, several computational methods to predict intrinsic disorder from protein sequences were developed and their performance is evaluated by the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we describe a new computational method, PredIDR, which provides accurate prediction of intrinsically disordered regions in proteins, mimicking experimental X-ray missing residues. Indeed, missing residues in Protein Data Bank (PDB) were used as positive examples to train a deep convolutional neural network which produces two types of output for short and long regions. PredIDR took part in the second round of CAID and was as accurate as the top state-of-the-art IDR prediction methods. PredIDR can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
Collapse
Affiliation(s)
- Kun-Sop Han
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Se-Ryong Song
- Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyon Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Song Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Pyok Ri
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
10
|
Song J, Kurgan L. Two decades of advances in sequence-based prediction of MoRFs, disorder-to-order transitioning binding regions. Expert Rev Proteomics 2025; 22:1-9. [PMID: 39789785 DOI: 10.1080/14789450.2025.2451715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/20/2024] [Accepted: 12/26/2024] [Indexed: 01/12/2025]
Abstract
INTRODUCTION Molecular recognition features (MoRFs) are regions in protein sequences that undergo induced folding upon binding partner molecules. MoRFs are common in nature and can be predicted from sequences based on their distinctive sequence signatures. AREAS COVERED We overview 20 years of progress in the sequence-based prediction of MoRFs which resulted in the development of 25 predictors of MoRFs that interact with proteins, peptides, and lipids. These methods range from simple discriminant analysis to sophisticated deep transformer networks that use protein language models. They generate relatively accurate predictions as evidenced by the results of a recently published community-driven assessment. EXPERT OPINION MoRFs prediction is a mature field of research that is poised to continue at a steady pace in the foreseeable future. We anticipate further expansion of the scope of MoRF predictions to additional partner molecules, such as nucleic acids, and continued use of recent machine learning advances. Other future efforts should concentrate on improving availability of MoRF predictions by releasing, maintaining, and popularizing web servers and by depositing MoRF predictions to large databases of protein structure and function predictions. Furthermore, accurate MoRF predictions should be coupled with the equally accurate prediction and modeling of the resulting structures of complexes.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
11
|
Wang K, Hu G, Wu Z, Kurgan L. Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn. Methods Mol Biol 2025; 2867:201-218. [PMID: 39576583 DOI: 10.1007/978-1-0716-4196-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
12
|
Peng Z, Wu H, Luo Y, Kurgan L. Prediction of Disordered Linkers Using APOD. Methods Mol Biol 2025; 2867:219-231. [PMID: 39576584 DOI: 10.1007/978-1-0716-4196-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered linkers (DLs) connect protein domains and structural elements within domains and facilitate allosteric regulation. Computational studies suggest that thousands of proteins have DLs. Since there are only about 250 proteins with manually curated DL annotations (DisProt database ver. 9.3), computational approaches that make accurate predictions of DLs from the protein sequences are essential for reducing this annotation gap. To this end, we recently released the Accurate Predictor Of DLs (APOD) method. Empirical tests show that APOD achieves Area Under the ROC Curve (AUC) of 0.82 and Matthews Correlation Coefficient (MCC) of 0.42 on a low-similarity test dataset. We implement APOD as a freely available and convenient web server at https://yanglab.qd.sdu.edu.cn/APOD/ . This web server takes a protein sequence as the input and outputs an easy-to-parse prediction result, with the entire prediction process done on the server side. We also provide a standalone version of APOD for users who want to process large datasets of sequences. This version must be installed and run locally on the end user's computer. In this chapter, we overview APOD, explain how to locate and use the web server and the standalone implementation, and discuss how to read and interpret APOD's outputs. We also demonstrate utility of APOD based on a case study protein.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China.
- Frontier Science Center for Nonlinear Expectations, Ministry of Education, Shandong University, Qingdao, China.
| | - Haiyan Wu
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Yuxian Luo
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
13
|
Zhang F, Kurgan L. Evaluation of predictions of disordered binding regions in the CAID2 experiment. Comput Struct Biotechnol J 2024; 27:78-88. [PMID: 39811792 PMCID: PMC11732247 DOI: 10.1016/j.csbj.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 12/12/2024] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open
Abstract
A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2's evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.
Collapse
Affiliation(s)
- Fuhao Zhang
- College of Information Engineering, Northwest A & F University, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
14
|
Basu S, Kurgan L. Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses. Comput Struct Biotechnol J 2024; 23:1968-1977. [PMID: 38765610 PMCID: PMC11098722 DOI: 10.1016/j.csbj.2024.04.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
Intrinsic disorder predictors were evaluated in several studies including the two large CAID experiments. However, these studies are biased towards eukaryotic proteins and focus primarily on the residue-level predictions. We provide first-of-its-kind assessment that comprehensively covers the taxonomy and evaluates predictions at the residue and disordered region levels. We curate a benchmark dataset that uniformly covers eukaryotic, archaeal, bacterial, and viral proteins. We find that predictive performance differs substantially across taxonomy, where viruses are predicted most accurately, followed by protists and higher eukaryotes, while bacterial and archaeal proteins suffer lower levels of accuracy. These trends are consistent across predictors. We also find that current tools, except for flDPnn, struggle with reproducing native distributions of the numbers and sizes of the disordered regions. Moreover, analysis of two variants of disorder predictions derived from the AlphaFold2 predicted structures reveals that they produce accurate residue-level propensities for archaea, bacteria and protists. However, they underperform for higher eukaryotes and generally struggle to accurately identify disordered regions. Our results motivate development of new predictors that target bacteria and archaea and which produce accurate results at both residue and region levels. We also stress the need to include the region-level assessments in future assessments.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
15
|
Pettitt AJ, Shukla VK, Figueiredo AM, Newton LS, McCarthy S, Tabor AB, Heller GT, Lorenz CD, Hansen DF. An integrative characterization of proline cis and trans conformers in a disordered peptide. Biophys J 2024; 123:3798-3811. [PMID: 39340152 PMCID: PMC11560310 DOI: 10.1016/j.bpj.2024.09.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/11/2024] [Accepted: 09/25/2024] [Indexed: 09/30/2024] Open
Abstract
Intrinsically disordered proteins (IDPs) often contain proline residues that undergo cis/trans isomerization. While molecular dynamics (MD) simulations have the potential to fully characterize the proline cis and trans subensembles, they are limited by the slow timescales of isomerization and force field inaccuracies. NMR spectroscopy can report on ensemble-averaged observables for both the cis-proline and trans-proline states, but a full atomistic characterization of these conformers is challenging. Given the importance of proline cis/trans isomerization for influencing the conformational sampling of disordered proteins, we employed a combination of all-atom MD simulations with enhanced sampling (metadynamics), NMR, and small-angle x-ray scattering (SAXS) to characterize the two subensembles of the ORF6 C-terminal region (ORF6CTR) from SARS-CoV-2 corresponding to the proline-57 (P57) cis and trans states. We performed MD simulations in three distinct force fields: AMBER03ws, AMBER99SB-disp, and CHARMM36m, which are all optimized for disordered proteins. Each simulation was run for an accumulated time of 180-220 μs until convergence was reached, as assessed by blocking analysis. A good agreement between the cis-P57 populations predicted from metadynamic simulations in AMBER03ws was observed with populations obtained from experimental NMR data. Moreover, we observed good agreement between the radius of gyration predicted from the metadynamic simulations in AMBER03ws and that measured using SAXS. Our findings suggest that both the cis-P57 and trans-P57 conformations of ORF6CTR are extremely dynamic and that interdisciplinary approaches combining both multiscale computations and experiments offer avenues to explore highly dynamic states that cannot be reliably characterized by either approach in isolation.
Collapse
Affiliation(s)
- Alice J Pettitt
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom; Department of Engineering, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, London, United Kingdom; The Francis Crick Institute, London, United Kingdom
| | - Vaibhav Kumar Shukla
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom; The Francis Crick Institute, London, United Kingdom
| | | | - Lydia S Newton
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom
| | - Stephen McCarthy
- Department of Chemistry, Faculty of Mathematical and Physical Sciences, London, United Kingdom
| | - Alethea B Tabor
- Department of Chemistry, Faculty of Mathematical and Physical Sciences, London, United Kingdom
| | - Gabriella T Heller
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom
| | - Christian D Lorenz
- Department of Engineering, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, London, United Kingdom.
| | - D Flemming Hansen
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom; The Francis Crick Institute, London, United Kingdom.
| |
Collapse
|
16
|
Hurali DT, Banerjee M, Ballal A. Unravelling the involvement of protein disorder in cyanobacterial stress responses. Int J Biol Macromol 2024; 277:133934. [PMID: 39025183 DOI: 10.1016/j.ijbiomac.2024.133934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/09/2024] [Accepted: 07/15/2024] [Indexed: 07/20/2024]
Abstract
This study has explored the involvement of Intrinsically Disordered Proteins (IDPs) in cyanobacterial stress response. IDPs possess distinct physicochemical properties, which allow them to execute diverse functions. Anabaena PCC 7120, the model photosynthetic, nitrogen-fixing cyanobacterium encodes 688 proteins (11 % of the total proteome) with at least one intrinsically disordered region (IDR). Of these, 130 proteins that showed >30 % overall disorder were designated as IDPs. Physico-chemical analysis, showed these IDPs to adopt shapes ranging from 'globular' to 'tadpole-like'. Upon exposure to NaCl, 41 IDP-encoding genes were found to be differentially expressed. Surprisingly, most of these were induced, indicating the importance of IDP-accumulation in overcoming salt stress. Subsequently, six IDPs were identified to be induced by multiple stresses (salt, ammonium and selenite). Interestingly, the presence of these 6-multiple stress-induced IDPs was conserved in filamentous cyanobacteria. Utilizing the experimental proteomic data of Anabaena, these 6 IDPs were found to interact with many proteins involved in diverse pathways, underscoring their physiological importance as protein hubs. This study lays the framework for IDP-related research in Anabaena by (a) identifying, as well as physiochemically characterizing, all the disordered proteins and (b) uncovering a subset of IDPs that are likely to be critical in adaptation to environmental stresses.
Collapse
Affiliation(s)
- Deepak T Hurali
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai 400094, India
| | - Manisha Banerjee
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai 400094, India.
| | - Anand Ballal
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai 400094, India.
| |
Collapse
|
17
|
Wang K, Hu G, Basu S, Kurgan L. flDPnn2: Accurate and Fast Predictor of Intrinsic Disorder in Proteins. J Mol Biol 2024; 436:168605. [PMID: 39237195 DOI: 10.1016/j.jmb.2024.168605] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/16/2024] [Accepted: 05/04/2024] [Indexed: 09/07/2024]
Abstract
Prediction of the intrinsic disorder in protein sequences is an active research area, with well over 100 predictors that were released to date. These efforts are motivated by the functional importance and high levels of abundance of intrinsic disorder, combined with relatively low amounts of experimental annotations. The disorder predictors are periodically evaluated by independent assessors in the Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiments. The recently completed CAID2 experiment assessed close to 40 state-of-the-art methods demonstrating that some of them produce accurate results. In particular, flDPnn2 method, which is the successor of flDPnn that performed well in the CAID1 experiment, secured the overall most accurate results on the Disorder-NOX dataset in CAID2. flDPnn2 implements a number of improvements when compared to its predecessor including changes to the inputs, increased size of the deep network model that we retrained on a larger training set, and addition of an alignment module. Using results from CAID2, we show that flDPnn2 produces accurate predictions very quickly, modestly improving over the accuracy of flDPnn and reducing the runtime by half, to about 27 s per protein. flDPnn2 is freely available as a convenient web server at http://biomine.cs.vcu.edu/servers/flDPnn2/.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
18
|
Panrat T, Phongdara A, Wuthisathid K, Meemetta W, Phiwsaiya K, Vanichviriyakit R, Senapin S, Sangsuriya P. Structural modelling and preventive strategy targeting of WSSV hub proteins to combat viral infection in shrimp Penaeus monodon. PLoS One 2024; 19:e0307976. [PMID: 39074084 DOI: 10.1371/journal.pone.0307976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 07/15/2024] [Indexed: 07/31/2024] Open
Abstract
White spot syndrome virus (WSSV) presents a considerable peril to the aquaculture sector, leading to notable financial consequences on a global scale. Previous studies have identified hub proteins, including WSSV051 and WSSV517, as essential binding elements in the protein interaction network of WSSV. This work further investigates the functional structures and potential applications of WSSV hub complexes in managing WSSV infection. Using computational methodologies, we have successfully generated comprehensive three-dimensional (3D) representations of hub proteins along with their three mutual binding counterparts, elucidating crucial interaction locations. The results of our study indicate that the WSSV051 hub protein demonstrates higher binding energy than WSSV517. Moreover, a unique motif, denoted as "S-S-x(5)-S-x(2)-P," was discovered among the binding proteins. This pattern perhaps contributes to the detection of partners by the hub proteins of WSSV. An antiviral strategy targeting WSSV hub proteins was demonstrated through the oral administration of dual hub double-stranded RNAs to the black tiger shrimp, Penaeus monodon, followed by a challenge assay. The findings demonstrate a decrease in shrimp mortality and a cessation of WSSV multiplication. In conclusion, our research unveils the structural features and dynamic interactions of hub complexes, shedding light on their significance in the WSSV protein network. This highlights the potential of hub protein-based interventions to mitigate the impact of WSSV infection in aquaculture.
Collapse
Affiliation(s)
- Tanate Panrat
- Prince of Songkla University International College, Prince of Songkla University, Hatyai Campus, Songkhla, Thailand
- Center for Genomics and Bioinformatics Research, Faculty of Science, Prince of Songkla University, Songkhla, Thailand
| | - Amornrat Phongdara
- Center for Genomics and Bioinformatics Research, Faculty of Science, Prince of Songkla University, Songkhla, Thailand
| | - Kitti Wuthisathid
- Center of Excellence for Shrimp Molecular Biology and Biotechnology (Centex Shrimp), Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Watcharachai Meemetta
- Center of Excellence for Shrimp Molecular Biology and Biotechnology (Centex Shrimp), Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Kornsunee Phiwsaiya
- Center of Excellence for Shrimp Molecular Biology and Biotechnology (Centex Shrimp), Faculty of Science, Mahidol University, Bangkok, Thailand
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Rapeepun Vanichviriyakit
- Center of Excellence for Shrimp Molecular Biology and Biotechnology (Centex Shrimp), Faculty of Science, Mahidol University, Bangkok, Thailand
- Department of Anatomy, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Saengchan Senapin
- Center of Excellence for Shrimp Molecular Biology and Biotechnology (Centex Shrimp), Faculty of Science, Mahidol University, Bangkok, Thailand
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Pakkakul Sangsuriya
- Aquatic Molecular Genetics and Biotechnology Research Team, BIOTEC, NSTDA, Pathum Thani, Thailand
| |
Collapse
|
19
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
20
|
Xu S, Onoda A. Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning. J Chem Inf Model 2024; 64:2901-2911. [PMID: 37883249 DOI: 10.1021/acs.jcim.3c01202] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Intrinsically disordered proteins (IDPs) play a vital role in various biological processes and have attracted increasing attention in the past few decades. Predicting IDPs from the primary structures of proteins offers a rapid and facile means of protein analysis without necessitating crystal structures. In particular, machine learning methods have demonstrated their potential in this field. Recently, protein language models (PLMs) are emerging as a promising approach to extracting essential information from protein sequences and have been employed in protein modeling to utilize their advantages of precision and efficiency. In this article, we developed a novel IDP prediction method named IDP-ELM to predict the intrinsically disordered regions (IDRs) as well as their functions including disordered flexible linkers and disordered protein binding. This method utilizes high-dimensional representations extracted from several state-of-the-art PLMs and predicts IDRs by ensemble learning based on bidirectional recurrent neural networks. The performance of the method was evaluated on two independent test data sets from CAID (critical assessment of protein intrinsic disorder prediction) and CAID2, indicating notable improvements in terms of area under the receiver operating characteristic (AUC), Matthew's correlation coefficient (MCC), and F1 score. Moreover, IDP-ELM requires solely protein sequences as inputs and does not entail a time-consuming process of protein profile generation, which is a prerequisite for most existing state-of-the-art methods, enabling an accurate, fast, and convenient tool for proteome-level analysis. The corresponding reproducible source code and model weights are available at https://github.com/xu-shi-jie/idp-elm.
Collapse
Affiliation(s)
- Shijie Xu
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Akira Onoda
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810, Japan
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
21
|
Gentili PL. The Conformational Contribution to Molecular Complexity and Its Implications for Information Processing in Living Beings and Chemical Artificial Intelligence. Biomimetics (Basel) 2024; 9:121. [PMID: 38392167 PMCID: PMC10886813 DOI: 10.3390/biomimetics9020121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 02/16/2024] [Accepted: 02/17/2024] [Indexed: 02/24/2024] Open
Abstract
This work highlights the relevant contribution of conformational stereoisomers to the complexity and functions of any molecular compound. Conformers have the same molecular and structural formulas but different orientations of the atoms in the three-dimensional space. Moving from one conformer to another is possible without breaking covalent bonds. The interconversion is usually feasible through the thermal energy available in ordinary conditions. The behavior of most biopolymers, such as enzymes, antibodies, RNA, and DNA, is understandable if we consider that each exists as an ensemble of conformers. Each conformational collection confers multi-functionality and adaptability to the single biopolymers. The conformational distribution of any biopolymer has the features of a fuzzy set. Hence, every compound that exists as an ensemble of conformers allows the molecular implementation of a fuzzy set. Since proteins, DNA, and RNA work as fuzzy sets, it is fair to say that life's logic is fuzzy. The power of processing fuzzy logic makes living beings capable of swift decisions in environments dominated by uncertainty and vagueness. These performances can be implemented in chemical robots, which are confined molecular assemblies mimicking unicellular organisms: they are supposed to help humans "colonise" the molecular world to defeat diseases in living beings and fight pollution in the environment.
Collapse
Affiliation(s)
- Pier Luigi Gentili
- Department of Chemistry, Biology, and Biotechnology, Università degli Studi di Perugia, 06123 Perugia, Italy
| |
Collapse
|
22
|
Vieira MFM, Hernandez G, Zhong Q, Arbesú M, Veloso T, Gomes T, Martins ML, Monteiro H, Frazão C, Frankel G, Zanzoni A, Cordeiro TN. The pathogen-encoded signalling receptor Tir exploits host-like intrinsic disorder for infection. Commun Biol 2024; 7:179. [PMID: 38351154 PMCID: PMC10864410 DOI: 10.1038/s42003-024-05856-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 01/26/2024] [Indexed: 02/16/2024] Open
Abstract
The translocated intimin receptor (Tir) is an essential type III secretion system (T3SS) effector of attaching and effacing pathogens contributing to the global foodborne disease burden. Tir acts as a cell-surface receptor in host cells, rewiring intracellular processes by targeting multiple host proteins. We investigated the molecular basis for Tir's binding diversity in signalling, finding that Tir is a disordered protein with host-like binding motifs. Unexpectedly, also are several other T3SS effectors. By an integrative approach, we reveal that Tir dimerises via an antiparallel OB-fold within a highly disordered N-terminal cytosolic domain. Also, it has a long disordered C-terminal cytosolic domain partially structured at host-like motifs that bind lipids. Membrane affinity depends on lipid composition and phosphorylation, highlighting a previously unrecognised host interaction impacting Tir-induced actin polymerisation and cell death. Furthermore, multi-site tyrosine phosphorylation enables Tir to engage host SH2 domains in a multivalent fuzzy complex, consistent with Tir's scaffolding role and binding promiscuity. Our findings provide insights into the intracellular Tir domains, highlighting the ability of T3SS effectors to exploit host-like protein disorder as a strategy for host evasion.
Collapse
Affiliation(s)
- Marta F M Vieira
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Guillem Hernandez
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Qiyun Zhong
- Department of Life Sciences, Imperial College London, South Kensington Campus, London, UK
| | - Miguel Arbesú
- Department of NMR-supported Structural Biology, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Berlin, Germany
- InstaDeep Ltd, 5 Merchant Square, London, UK
| | - Tiago Veloso
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Tiago Gomes
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Maria L Martins
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Hugo Monteiro
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Carlos Frazão
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal
| | - Gad Frankel
- Department of Life Sciences, Imperial College London, South Kensington Campus, London, UK
| | - Andreas Zanzoni
- Aix-Marseille Université, Inserm, TAGC, UMR_S1090, Marseille, France
| | - Tiago N Cordeiro
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal.
| |
Collapse
|
23
|
Kruglikov A, Xia X. Mesophiles vs. Thermophiles: Untangling the Hot Mess of Intrinsically Disordered Proteins and Growth Temperature of Bacteria. Int J Mol Sci 2024; 25:2000. [PMID: 38396678 PMCID: PMC10889376 DOI: 10.3390/ijms25042000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 01/31/2024] [Accepted: 02/05/2024] [Indexed: 02/25/2024] Open
Abstract
The dynamic structures and varying functions of intrinsically disordered proteins (IDPs) have made them fascinating subjects in molecular biology. Investigating IDP abundance in different bacterial species is crucial for understanding adaptive strategies in diverse environments. Notably, thermophilic bacteria have lower IDP abundance than mesophiles, and a negative correlation with optimal growth temperature (OGT) has been observed. However, the factors driving these trends are yet to be fully understood. We examined the types of IDPs present in both mesophiles and thermophiles alongside those unique to just mesophiles. The shared group of IDPs exhibits similar disorder levels in the two groups of species, suggesting that certain IDPs unique to mesophiles may contribute to the observed decrease in IDP abundance as OGT increases. Subsequently, we used quasi-independent contrasts to explore the relationship between OGT and IDP abundance evolution. Interestingly, we found no significant relationship between OGT and IDP abundance contrasts, suggesting that the evolution of lower IDP abundance in thermophiles may not be solely linked to OGT. This study provides a foundation for future research into the intricate relationship between IDP evolution and environmental adaptation. Our findings support further research on the adaptive significance of intrinsic disorder in bacterial species.
Collapse
Affiliation(s)
- Alibek Kruglikov
- Department of Biology, University of Ottawa, 30 Marie Curie, Station A, P.O. Box 450, Ottawa, ON K1N 6N5, Canada
| | - Xuhua Xia
- Department of Biology, University of Ottawa, 30 Marie Curie, Station A, P.O. Box 450, Ottawa, ON K1N 6N5, Canada
- Ottawa Institute of Systems Biology, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
24
|
Zhang J, Basu S, Kurgan L. HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins. Nucleic Acids Res 2024; 52:e10. [PMID: 38048333 PMCID: PMC10810184 DOI: 10.1093/nar/gkad1131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, PR China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
25
|
Wang W, Shuai Y, Yang Q, Zhang F, Zeng M, Li M. A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches. Brief Bioinform 2024; 25:bbae050. [PMID: 38388682 PMCID: PMC10883809 DOI: 10.1093/bib/bbae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/17/2024] [Accepted: 01/26/2024] [Indexed: 02/24/2024] Open
Abstract
Proteins play an important role in life activities and are the basic units for performing functions. Accurately annotating functions to proteins is crucial for understanding the intricate mechanisms of life and developing effective treatments for complex diseases. Traditional biological experiments struggle to keep pace with the growing number of known proteins. With the development of high-throughput sequencing technology, a wide variety of biological data provides the possibility to accurately predict protein functions by computational methods. Consequently, many computational methods have been proposed. Due to the diversity of application scenarios, it is necessary to conduct a comprehensive evaluation of these computational methods to determine the suitability of each algorithm for specific cases. In this study, we present a comprehensive benchmark, BeProf, to process data and evaluate representative computational methods. We first collect the latest datasets and analyze the data characteristics. Then, we investigate and summarize 17 state-of-the-art computational methods. Finally, we propose a novel comprehensive evaluation metric, design eight application scenarios and evaluate the performance of existing methods on these scenarios. Based on the evaluation, we provide practical recommendations for different scenarios, enabling users to select the most suitable method for their specific needs. All of these servers can be obtained from https://csuligroup.com/BEPROF and https://github.com/CSUBioGroup/BEPROF.
Collapse
Affiliation(s)
- Wenkang Wang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Yunyan Shuai
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Qiurong Yang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| |
Collapse
|
26
|
Leblanc S, Yala F, Provencher N, Lucier JF, Levesque M, Lapointe X, Jacques JF, Fournier I, Salzet M, Ouangraoua A, Scott MS, Boisvert FM, Brunet MA, Roucou X. OpenProt 2.0 builds a path to the functional characterization of alternative proteins. Nucleic Acids Res 2024; 52:D522-D528. [PMID: 37956315 PMCID: PMC10767855 DOI: 10.1093/nar/gkad1050] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
The OpenProt proteogenomic resource (https://www.openprot.org/) provides users with a complete and freely accessible set of non-canonical or alternative open reading frames (AltORFs) within the transcriptome of various species, as well as functional annotations of the corresponding protein sequences not found in standard databases. Enhancements in this update are largely the result of user feedback and include the prediction of structure, subcellular localization, and intrinsic disorder, using cutting-edge algorithms based on machine learning techniques. The mass spectrometry pipeline now integrates a machine learning-based peptide rescoring method to improve peptide identification. We continue to help users explore this cryptic proteome by providing OpenCustomDB, a tool that enables users to build their own customized protein databases, and OpenVar, a genomic annotator including genetic variants within AltORFs and protein sequences. A new interface improves the visualization of all functional annotations, including a spectral viewer and the prediction of multicoding genes. All data on OpenProt are freely available and downloadable. Overall, OpenProt continues to establish itself as an important resource for the exploration and study of new proteins.
Collapse
Affiliation(s)
- Sébastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Feriel Yala
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Nicolas Provencher
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Jean-François Lucier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Department of Biology, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Maxime Levesque
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Xavier Lapointe
- Department of Pediatrics, Medical Genetics Service, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Jean-Francois Jacques
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Isabelle Fournier
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Michel Salzet
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Aïda Ouangraoua
- Informatics Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC J1H 5N4, Canada
| | - François-Michel Boisvert
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cellular Biology, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Marie A Brunet
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC J1H 5N4, Canada
- Department of Pediatrics, Medical Genetics Service, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC J1H 5N4, Canada
| |
Collapse
|
27
|
Uversky VN. Functional unfoldomics: Roles of intrinsic disorder in protein (multi)functionality. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2023; 138:179-210. [PMID: 38220424 DOI: 10.1016/bs.apcsb.2023.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Intrinsically disordered proteins (IDPs), which are functional proteins without stable tertiary structure, and hybrid proteins containing ordered domains and intrinsically disordered regions (IDRs) constitute prominent parts of all proteomes collectively known as unfoldomes. IDPs/IDRs exist as highly dynamic structural ensembles of rapidly interconverting conformations and are characterized by the exceptional structural heterogeneity, where their different parts are (dis)ordered to different degree, and their overall structure represents a complex mosaic of foldons, inducible foldons, inducible morphing foldons, non-foldons, semifoldons, and even unfoldons. Despite their lack of unique 3D structures, IDPs/IDRs play crucial roles in the control of various biological processes and the regulation of different cellular pathways and are commonly involved in recognition and signaling, indicating that the disorder-based functional repertoire is complementary to the functions of ordered proteins. Furthermore, IDPs/IDRs are frequently multifunctional, and this multifunctionality is defined by their structural flexibility and heterogeneity. Intrinsic disorder phenomenon is at the roots of the structure-function continuum model, where the structure continuum is defined by the presence of differently (dis)ordered regions, and the function continuum arises from the ability of all these differently (dis)ordered parts to have different functions. In their everyday life, IDPs/IDRs utilize a broad spectrum of interaction mechanisms thereby acting as interaction specialists. They are crucial for the biogenesis of numerous proteinaceous membrane-less organelles driven by the liquid-liquid phase separation. This review introduces functional unfoldomics by representing some aspects of the intrinsic disorder-based functionality.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, United States.
| |
Collapse
|
28
|
Basu S, Hegedűs T, Kurgan L. CoMemMoRFPred: Sequence-based Prediction of MemMoRFs by Combining Predictors of Intrinsic Disorder, MoRFs and Disordered Lipid-binding Regions. J Mol Biol 2023; 435:168272. [PMID: 37709009 DOI: 10.1016/j.jmb.2023.168272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/01/2023] [Accepted: 09/07/2023] [Indexed: 09/16/2023]
Abstract
Molecular recognition features (MoRFs) are a commonly occurring type of intrinsically disordered regions (IDRs) that undergo disorder-to-order transition upon binding to partner molecules. We focus on recently characterized and functionally important membrane-binding MoRFs (MemMoRFs). Motivated by the lack of computational tools that predict MemMoRFs, we use a dataset of experimentally annotated MemMoRFs to conceptualize, design, evaluate and release an accurate sequence-based predictor. We rely on state-of-the-art tools that predict residues that possess key characteristics of MemMoRFs, such as intrinsic disorder, disorder-to-order transition and lipid-binding. We identify and combine results from three tools that include flDPnn for the disorder prediction, DisoLipPred for the prediction of disordered lipid-binding regions, and MoRFCHiBiLight for the prediction of disorder-to-order transitioning protein binding regions. Our empirical analysis demonstrates that combining results produced by these three methods generates accurate predictions of MemMoRFs. We also show that use of a smoothing operator produces predictions that closely mimic the number and sizes of the native MemMoRF regions. The resulting CoMemMoRFPred method is available as an easy-to-use webserver at http://biomine.cs.vcu.edu/servers/CoMemMoRFPred. This tool will aid future studies of MemMoRFs in the context of exploring their abundance, cellular functions, and roles in pathologic phenomena.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Tamás Hegedűs
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary; ELKH-SE Biophysical Virology Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA.
| |
Collapse
|
29
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
30
|
Pang Y, Liu B. IDP-LM: Prediction of protein intrinsic disorder and disorder functions based on language models. PLoS Comput Biol 2023; 19:e1011657. [PMID: 37992088 DOI: 10.1371/journal.pcbi.1011657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 12/06/2023] [Accepted: 11/03/2023] [Indexed: 11/24/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) and regions (IDRs) are a class of functionally important proteins and regions that lack stable three-dimensional structures under the native physiologic conditions. They participate in critical biological processes and thus are associated with the pathogenesis of many severe human diseases. Identifying the IDPs/IDRs and their functions will be helpful for a comprehensive understanding of protein structures and functions, and inform studies of rational drug design. Over the past decades, the exponential growth in the number of proteins with sequence information has deepened the gap between uncharacterized and annotated disordered sequences. Protein language models have recently demonstrated their powerful abilities to capture complex structural and functional information from the enormous quantity of unlabelled protein sequences, providing opportunities to apply protein language models to uncover the intrinsic disorders and their biological properties from the amino acid sequences. In this study, we proposed a computational predictor called IDP-LM for predicting intrinsic disorder and disorder functions by leveraging the pre-trained protein language models. IDP-LM takes the embeddings extracted from three pre-trained protein language models as the exclusive inputs, including ProtBERT, ProtT5 and a disorder specific language model (IDP-BERT). The ablation analysis shown that the IDP-BERT provided fine-grained feature representations of disorder, and the combination of three language models is the key to the performance improvement of IDP-LM. The evaluation results on independent test datasets demonstrated that the IDP-LM provided high-quality prediction results for intrinsic disorder and four common disordered functions.
Collapse
Affiliation(s)
- Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
31
|
Nussinov R, Liu Y, Zhang W, Jang H. Protein conformational ensembles in function: roles and mechanisms. RSC Chem Biol 2023; 4:850-864. [PMID: 37920394 PMCID: PMC10619138 DOI: 10.1039/d3cb00114h] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/02/2023] [Indexed: 11/04/2023] Open
Abstract
The sequence-structure-function paradigm has dominated twentieth century molecular biology. The paradigm tacitly stipulated that for each sequence there exists a single, well-organized protein structure. Yet, to sustain cell life, function requires (i) that there be more than a single structure, (ii) that there be switching between the structures, and (iii) that the structures be incompletely organized. These fundamental tenets called for an updated sequence-conformational ensemble-function paradigm. The powerful energy landscape idea, which is the foundation of modernized molecular biology, imported the conformational ensemble framework from physics and chemistry. This framework embraces the recognition that proteins are dynamic and are always interconverting between conformational states with varying energies. The more stable the conformation the more populated it is. The changes in the populations of the states are required for cell life. As an example, in vivo, under physiological conditions, wild type kinases commonly populate their more stable "closed", inactive, conformations. However, there are minor populations of the "open", ligand-free states. Upon their stabilization, e.g., by high affinity interactions or mutations, their ensembles shift to occupy the active states. Here we discuss the role of conformational propensities in function. We provide multiple examples of diverse systems, including protein kinases, lipid kinases, and Ras GTPases, discuss diverse conformational mechanisms, and provide a broad outlook on protein ensembles in the cell. We propose that the number of molecules in the active state (inactive for repressors), determine protein function, and that the dynamic, relative conformational propensities, rather than the rigid structures, are the hallmark of cell life.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research Frederick MD 21702 USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University Tel Aviv 69978 Israel
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| | - Wengang Zhang
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research Frederick MD 21702 USA
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| |
Collapse
|
32
|
Yue C, Zhang C, Zhang R, Yuan J. Tethered particle motion of the adaptation enzyme CheR in bacterial chemotaxis. iScience 2023; 26:107950. [PMID: 37817931 PMCID: PMC10561060 DOI: 10.1016/j.isci.2023.107950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 07/25/2023] [Accepted: 09/14/2023] [Indexed: 10/12/2023] Open
Abstract
Bacteria perform chemotactic adaptation by sequential modification of multiple modifiable sites on chemoreceptors through stochastic action of tethered adaptation enzymes (CheR and CheB). To study the molecular kinetics of this process, we measured the response to different concentrations of MeAsp for the Tar-only Escherichia coli strain. We found a strong dependence of the methylation rate on the methylation level and established a new mechanism of adaptation kinetics due to tethered particle motion of the methylation enzyme CheR. Experiments with various lengths of the C-terminal flexible chain in the Tar receptor further validated this mechanism. The tethered particle motion resulted in a CheR concentration gradient that ensures encounter-rate matching of the sequential modifiable sites. An analytical model of multisite catalytic reaction showed that this enables robustness of methylation to fluctuations in receptor activity or cell-to-cell variations in the expression of adaptation enzymes and reduces the variation in methylation level among individual receptors.
Collapse
Affiliation(s)
- Caijuan Yue
- Hefei National Laboratory for Physical Sciences at the Microscale, and Department of Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Chi Zhang
- Hefei National Laboratory for Physical Sciences at the Microscale, and Department of Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Rongjing Zhang
- Hefei National Laboratory for Physical Sciences at the Microscale, and Department of Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Junhua Yuan
- Hefei National Laboratory for Physical Sciences at the Microscale, and Department of Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
33
|
Djulbegovic M, Taylor Gonzalez DJ, Antonietti M, Uversky VN, Shields CL, Karp CL. Intrinsic disorder may drive the interaction of PROS1 and MERTK in uveal melanoma. Int J Biol Macromol 2023; 250:126027. [PMID: 37506796 PMCID: PMC11182630 DOI: 10.1016/j.ijbiomac.2023.126027] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 07/23/2023] [Accepted: 07/25/2023] [Indexed: 07/30/2023]
Abstract
BACKGROUND Class 2 uveal melanomas are associated with the inactivation of the BRCA1 ((breast cancer type 1 susceptibility protein)-associated protein 1 (BAP1)) gene. Inactivation of BAP1 promotes the upregulation of vitamin K-dependent protein S (PROS1), which interacts with the tyrosine-protein kinase Mer (MERTK) receptor on M2 macrophages to induce an immunosuppressive environment. METHODS We simulated the interaction of PROS1 with MERTK with ColabFold. We evaluated PROS1 and MERTK for the presence of intrinsically disordered protein regions (IDPRs) and disorder-to-order (DOT) regions to understand their protein-protein interaction (PPI). We first evaluated the structure of each protein with AlphaFold. We then analyzed specific sequence-based features of the PROS1 and MERTK with a suite of bioinformatics tools. RESULTS With high-resolution, moderate confidence, we successfully modeled the interaction between PROS1 and MERTK (predicted local distance difference test score (pDLLT) = 70.68). Our structural analysis qualitatively demonstrated IDPRs (i.e., spaghetti-like entities) in PROS1 and MERK. PROS1 was 23.37 % disordered, and MERTK was 23.09 % disordered, classifying them as moderately disordered and flexible proteins. PROS1 was significantly enriched in cysteine, the most order-promoting residue (p-value <0.05). Our IUPred analysis demonstrated that there are two disorder-to-order transition (DOT) regions in PROS1. MERTK was significantly enriched in proline, the most disorder-promoting residue (p-value <0.05), but did not contain DOT regions. Our STRING analysis demonstrated that the PPI network between PROS1 and MERTK is more complex than their assumed one-to-one binding (p-value <2.0 × 10-6). CONCLUSION Our findings present a novel prediction for the interaction between PROS1 and MERTK. Our findings show that PROS1 and MERTK contain elements of intrinsic disorder. PROS1 has two DOT regions that are attractive immunotherapy targets. We recommend that IDPRs and DOT regions found in PROS1 and MERTK should be considered when developing immunotherapies targeting this PPI.
Collapse
Affiliation(s)
- Mak Djulbegovic
- Bascom Palmer Eye Institute, University of Miami, Miami, FL, USA
| | | | | | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Carol L Shields
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Carol L Karp
- Bascom Palmer Eye Institute, University of Miami, Miami, FL, USA.
| |
Collapse
|
34
|
Antonietti M, Gonzalez DJT, Djulbegovic M, Dayhoff GW, Uversky VN, Shields CL, Karp CL. Intrinsic disorder in PRAME and its role in uveal melanoma. Cell Commun Signal 2023; 21:222. [PMID: 37626310 PMCID: PMC10463658 DOI: 10.1186/s12964-023-01197-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 06/13/2023] [Indexed: 08/27/2023] Open
Abstract
INTRODUCTION The PReferentially expressed Antigen in MElanoma (PRAME) protein has been shown to be an independent biomarker for increased risk of metastasis in Class 1 uveal melanomas (UM). Intrinsically disordered proteins and regions of proteins (IDPs/IDPRs) are proteins that do not have a well-defined three-dimensional structure and have been linked to neoplastic development. Our study aimed to evaluate the presence of intrinsic disorder in PRAME and the role these structureless regions have in PRAME( +) Class 1 UM. METHODS A bioinformatics study to characterize PRAME's propensity for the intrinsic disorder. We first used the AlphaFold tool to qualitatively assess the protein structure of PRAME. Then we used the Compositional Profiler and a set of per-residue intrinsic disorder predictors to quantify the intrinsic disorder. The Database of Disordered Protein Prediction (D2P2) platform, IUPred, FuzDrop, fIDPnn, AUCpred, SPOT-Disorder2, and metapredict V2 allowed us to evaluate the potential functional disorder of PRAME. Additionally, we used the Search Tool for the Retrieval of Interacting Genes (STRING) to analyze PRAME's potential interactions with other proteins. RESULTS Our structural analysis showed that PRAME contains intrinsically disordered protein regions (IDPRs), which are structureless and flexible. We found that PRAME is significantly enriched with serine (p-value < 0.05), a disorder-promoting amino acid. PRAME was found to have an average disorder score of 16.49% (i.e., moderately disordered) across six per-residue intrinsic disorder predictors. Our IUPred analysis revealed the presence of disorder-to-order transition (DOT) regions in PRAME near the C-terminus of the protein (residues 475-509). The D2P2 platform predicted a region from approximately 140 and 175 to be highly concentrated with post-translational modifications (PTMs). FuzDrop predicted the PTM hot spot of PRAME to be a droplet-promoting region and an aggregation hotspot. Finally, our analysis using the STRING tool revealed that PRAME has significantly more interactions with other proteins than expected for randomly selected proteins of the same size, with the ability to interact with 84 different partners (STRING analysis result: p-value < 1.0 × 10-16; model confidence: 0.400). CONCLUSION Our study revealed that PRAME has IDPRs that are possibly linked to its functionality in the context of Class 1 UM. The regions of functionality (i.e., DOT regions, PTM sites, droplet-promoting regions, and aggregation hotspots) are localized to regions of high levels of disorder. PRAME has a complex protein-protein interaction (PPI) network that may be secondary to the structureless features of the polypeptide. Our findings contribute to our understanding of UM and suggest that IDPRs and DOT regions in PRAME may be targeted in developing new therapies for this aggressive cancer. Video Abstract.
Collapse
Affiliation(s)
- Michael Antonietti
- Bascom Palmer Eye Institute, University of Miami, 900 NW 17th Street, Miami, FL, 33136, USA
| | | | - Mak Djulbegovic
- Bascom Palmer Eye Institute, University of Miami, 900 NW 17th Street, Miami, FL, 33136, USA
| | - Guy W Dayhoff
- Department of Chemistry, College of Art and Sciences, University of South Florida, FL, 33612, Tampa, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, FL, 33612, Tampa, USA
| | - Carol L Shields
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, PA, Philadelphia, USA
| | - Carol L Karp
- Bascom Palmer Eye Institute, University of Miami, 900 NW 17th Street, Miami, FL, 33136, USA.
| |
Collapse
|
35
|
Viola G, Floriani F, Barracchia CG, Munari F, D'Onofrio M, Assfalg M. Ultrasmall Gold Nanoparticles as Clients of Biomolecular Condensates. Chemistry 2023; 29:e202301274. [PMID: 37293933 DOI: 10.1002/chem.202301274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/29/2023] [Accepted: 06/09/2023] [Indexed: 06/10/2023]
Abstract
Liquid-liquid phase separation (LLPS) of biopolymers to form condensates is a widespread phenomenon in living cells. Agents that target or alter condensation can help uncover elusive physiological and pathological mechanisms. Owing to their unique material properties and modes of interaction with biomolecules, nanoparticles represent attractive condensate-targeting agents. Our work focused on elucidating the interaction between ultrasmall gold nanoparticles (usGNPs) and diverse types of condensates of tau, a representative phase-separating protein associated with neurodegenerative disorders. usGNPs attract considerable interest in the biomedical community due to unique features, including emergent optical properties and good cell penetration. We explored the interaction of usGNPs with reconstituted self-condensates of tau, two-component tau/polyanion and three-component tau/RNA/alpha-synuclein coacervates. The usGNPs were found to concentrate into condensed liquid droplets, consistent with the formation of dynamic client (nanoparticle) - scaffold (tau) interactions, and were observable thanks to their intrinsic luminescence. Furthermore, usGNPs were capable to promote LLPS of a protein domain which is unable to phase separate on its own. Our study demonstrates the ability of usGNPs to interact with and illuminate protein condensates. We anticipate that nanoparticles will have broad applicability as nanotracers to interrogate phase separation, and as nanoactuators controlling the formation and dissolution of condensates.
Collapse
Affiliation(s)
- Giovanna Viola
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | - Fulvio Floriani
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | | | - Francesca Munari
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | | | - Michael Assfalg
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| |
Collapse
|
36
|
Mambetsariev I, Fricke J, Gruber SB, Tan T, Babikian R, Kim P, Vishnubhotla P, Chen J, Kulkarni P, Salgia R. Clinical Network Systems Biology: Traversing the Cancer Multiverse. J Clin Med 2023; 12:4535. [PMID: 37445570 PMCID: PMC10342467 DOI: 10.3390/jcm12134535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/29/2023] [Accepted: 07/01/2023] [Indexed: 07/15/2023] Open
Abstract
In recent decades, cancer biology and medicine have ushered in a new age of precision medicine through high-throughput approaches that led to the development of novel targeted therapies and immunotherapies for different cancers. The availability of multifaceted high-throughput omics data has revealed that cancer, beyond its genomic heterogeneity, is a complex system of microenvironments, sub-clonal tumor populations, and a variety of other cell types that impinge on the genetic and non-genetic mechanisms underlying the disease. Thus, a systems approach to cancer biology has become instrumental in identifying the key components of tumor initiation, progression, and the eventual emergence of drug resistance. Through the union of clinical medicine and basic sciences, there has been a revolution in the development and approval of cancer therapeutic drug options including tyrosine kinase inhibitors, antibody-drug conjugates, and immunotherapy. This 'Team Medicine' approach within the cancer systems biology framework can be further improved upon through the development of high-throughput clinical trial models that utilize machine learning models, rapid sample processing to grow patient tumor cell cultures, test multiple therapeutic options and assign appropriate therapy to individual patients quickly and efficiently. The integration of systems biology into the clinical network would allow for rapid advances in personalized medicine that are often hindered by a lack of drug development and drug testing.
Collapse
Affiliation(s)
- Isa Mambetsariev
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Jeremy Fricke
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Stephen B. Gruber
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Tingting Tan
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Razmig Babikian
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Pauline Kim
- Department of Pharmacy, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Priya Vishnubhotla
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Medical Oncology, City of Hope Atlanta, Newnan, GA 30265, USA
| | - Jianjun Chen
- Department of Systems Biology, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Prakash Kulkarni
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Systems Biology, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Ravi Salgia
- Department of Medical Oncology and Therapeutic Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| |
Collapse
|
37
|
Zhao B, Ghadermarzi S, Kurgan L. Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins. Comput Struct Biotechnol J 2023; 21:3248-3258. [PMID: 38213902 PMCID: PMC10782001 DOI: 10.1016/j.csbj.2023.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/01/2023] [Indexed: 01/13/2024] Open
Abstract
We expand studies of AlphaFold2 (AF2) in the context of intrinsic disorder prediction by comparing it against a broad selection of 20 accurate, popular and recently released disorder predictors. We use 25% larger benchmark dataset with 646 proteins and cover protein-level predictions of disorder content and fully disordered proteins. AF2-based disorder predictions secure a relatively high Area Under receiver operating characteristic Curve (AUC) of 0.77 and are statistically outperformed by several modern disorder predictors that secure AUCs around 0.8 with median runtime of about 20 s compared to 1200 s for AF2. Moreover, AF2 provides modestly accurate predictions of fully disordered proteins (F1 = 0.59 vs. 0.91 for the best disorder predictor) and disorder content (mean absolute error of 0.21 vs. 0.15). AF2 also generates statistically more accurate disorder predictions for about 20% of proteins that have relatively short sequences and a few disordered regions that tend to be located at the sequence termini, and which are absent of disordered protein-binding regions. Interestingly, AF2 and the most accurate disorder predictors rely on deep neural networks, suggesting that these models are useful for protein structure and disorder predictions.
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
38
|
Di Ianni A, Tüting C, Kipping M, Ihling CH, Köppen J, Iacobucci C, Arlt C, Kastritis PL, Sinz A. Structural assessment of the full-length wild-type tumor suppressor protein p53 by mass spectrometry-guided computational modeling. Sci Rep 2023; 13:8497. [PMID: 37231156 DOI: 10.1038/s41598-023-35437-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 05/18/2023] [Indexed: 05/27/2023] Open
Abstract
The tetrameric tumor suppressor p53 represents a great challenge for 3D-structural analysis due to its high degree of intrinsic disorder (ca. 40%). We aim to shed light on the structural and functional roles of p53's C-terminal region in full-length, wild-type human p53 tetramer and their importance for DNA binding. For this, we employed complementary techniques of structural mass spectrometry (MS) in an integrated approach with computational modeling. Our results show no major conformational differences in p53 between DNA-bound and DNA-free states, but reveal a substantial compaction of p53's C-terminal region. This supports the proposed mechanism of unspecific DNA binding to the C-terminal region of p53 prior to transcription initiation by specific DNA binding to the core domain of p53. The synergies between complementary structural MS techniques and computational modeling as pursued in our integrative approach is envisioned to serve as general strategy for studying intrinsically disordered proteins (IDPs) and intrinsically disordered region (IDRs).
Collapse
Affiliation(s)
- Alessio Di Ianni
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
| | - Christian Tüting
- ZIK HALOmem and Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 06120, Halle (Saale), Germany
| | - Marc Kipping
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
| | - Christian H Ihling
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
| | - Janett Köppen
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
| | - Claudio Iacobucci
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany
- Department of Physical and Chemical Sciences, University of L'Aquila, Via Vetoio, Coppito, 67100, L'Aquila, Italy
| | - Christian Arlt
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany.
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany.
| | - Panagiotis L Kastritis
- ZIK HALOmem and Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 06120, Halle (Saale), Germany
| | - Andrea Sinz
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany.
- Center for Structural Mass Spectrometry, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Str. 3, 01620, Halle (Saale), Germany.
| |
Collapse
|
39
|
Basu S, Gsponer J, Kurgan L. DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction. Nucleic Acids Res 2023:7151337. [PMID: 37140058 DOI: 10.1093/nar/gkad330] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 04/12/2023] [Accepted: 04/18/2023] [Indexed: 05/05/2023] Open
Abstract
Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
40
|
Estevens R, Mil-Homens D, Fialho AM. In-Silico Analysis Highlights the Existence in Members of Burkholderia cepacia Complex of a New Class of Adhesins Possessing Collagen-like Domains. Microorganisms 2023; 11:1118. [PMID: 37317093 DOI: 10.3390/microorganisms11051118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/18/2023] [Accepted: 04/22/2023] [Indexed: 06/16/2023] Open
Abstract
Burkholderia cenocepacia is a multi-drug-resistant lung pathogen. This species synthesizes various virulence factors, among which cell-surface components (adhesins) are critical for establishing the contact with host cells. This work in the first part focuses on the current knowledge about the adhesion molecules described in this species. In the second part, through in silico approaches, we perform a comprehensive analysis of a group of unique bacterial proteins possessing collagen-like domains (CLDs) that are strikingly overrepresented in the Burkholderia species, representing a new putative class of adhesins. We identified 75 CLD-containing proteins in Burkholderia cepacia complex (Bcc) members (Bcc-CLPs). The phylogenetic analysis of Bcc-CLPs revealed the evolution of the core domain denominated "Bacterial collagen-like, middle region". Our analysis remarkably shows that these proteins are formed by extensive sets of compositionally biased residues located within intrinsically disordered regions (IDR). Here, we discuss how IDR functions may increase their efficiency as adhesion factors. Finally, we provided an analysis of a set of five homologs identified in B. cenocepacia J2315. Thus, we propose the existence in Bcc of a new type of adhesion factors distinct from the described collagen-like proteins (CLPs) found in Gram-positive bacteria.
Collapse
Affiliation(s)
- Ricardo Estevens
- Department of Bioengineering, Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
| | - Dalila Mil-Homens
- Department of Bioengineering, Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
- Institute for Bioengineering and Biosciences (iBB), Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
- Institute for Health and Bioeconomic (i4HB), Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
| | - Arsenio M Fialho
- Department of Bioengineering, Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
- Institute for Bioengineering and Biosciences (iBB), Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
- Institute for Health and Bioeconomic (i4HB), Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
| |
Collapse
|
41
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
42
|
Kovács D, Bodor A. The influence of random-coil chemical shifts on the assessment of structural propensities in folded proteins and IDPs. RSC Adv 2023; 13:10182-10203. [PMID: 37006359 PMCID: PMC10065145 DOI: 10.1039/d3ra00977g] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/15/2023] [Indexed: 04/03/2023] Open
Abstract
In studying secondary structural propensities of proteins by nuclear magnetic resonance (NMR) spectroscopy, secondary chemical shifts (SCSs) serve as the primary atomic scale observables. For SCS calculation, the selection of an appropriate random coil chemical shift (RCCS) dataset is a crucial step, especially when investigating intrinsically disordered proteins (IDPs). The scientific literature is abundant in such datasets, however, the effect of choosing one over all the others in a concrete application has not yet been studied thoroughly and systematically. Hereby, we review the available RCCS prediction methods and to compare them, we conduct statistical inference by means of the nonparametric sum of ranking differences and comparison of ranks to random numbers (SRD-CRRN) method. We try to find the RCCS predictors best representing the general consensus regarding secondary structural propensities. The existence and the magnitude of resulting differences on secondary structure determination under varying sample conditions (temperature, pH) are demonstrated and discussed for globular proteins and especially IDPs.
Collapse
Affiliation(s)
- Dániel Kovács
- ELTE, Eötvös Loránd University, Institute of Chemistry, Analytical and BioNMR Laboratory Pázmány Péter sétány 1/A Budapest 1117 Hungary
- Eötvös Loránd University, Hevesy György PhD School of Chemistry Pázmány Péter sétány 1/A Budapest 1117 Hungary
| | - Andrea Bodor
- ELTE, Eötvös Loránd University, Institute of Chemistry, Analytical and BioNMR Laboratory Pázmány Péter sétány 1/A Budapest 1117 Hungary
| |
Collapse
|
43
|
Zhang F, Li M, Zhang J, Kurgan L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res 2023; 51:e25. [PMID: 36629262 PMCID: PMC10018345 DOI: 10.1093/nar/gkac1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/22/2022] [Accepted: 12/15/2022] [Indexed: 01/12/2023] Open
Abstract
The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
44
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
45
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
Background: The evolutionary rate of disordered proteins varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of intrinsically disordered regions (IDRs) across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards low complexity regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, low complexity proteins across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of low complexity, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
Affiliation(s)
- Christos E. Kouros
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vasiliki Makri
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| | - Anastasia Chasapi
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| |
Collapse
|
46
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 04/25/2023] Open
Abstract
Background: The evolutionary rate of disordered protein regions varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of compositional bias, indicative of disorder, across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards biased regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, proteins with compositional bias across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of compositional bias, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
Affiliation(s)
- Christos E. Kouros
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vasiliki Makri
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| | - Anastasia Chasapi
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| |
Collapse
|
47
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
48
|
Peng Z, Li Z, Meng Q, Zhao B, Kurgan L. CLIP: accurate prediction of disordered linear interacting peptides from protein sequences using co-evolutionary information. Brief Bioinform 2023; 24:6858950. [PMID: 36458437 DOI: 10.1093/bib/bbac502] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 09/30/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,Frontier Science Center for Nonlinear Expectations, Ministry of Education, Qingdao, 266237, China
| | - Zixia Li
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Qiaozhen Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
49
|
Evans R, Ramisetty S, Kulkarni P, Weninger K. Illuminating Intrinsically Disordered Proteins with Integrative Structural Biology. Biomolecules 2023; 13:124. [PMID: 36671509 PMCID: PMC9856150 DOI: 10.3390/biom13010124] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/01/2023] [Accepted: 01/04/2023] [Indexed: 01/11/2023] Open
Abstract
Intense study of intrinsically disordered proteins (IDPs) did not begin in earnest until the late 1990s when a few groups, working independently, convinced the community that these 'weird' proteins could have important functions. Over the past two decades, it has become clear that IDPs play critical roles in a multitude of biological phenomena with prominent examples including coordination in signaling hubs, enabling gene regulation, and regulating ion channels, just to name a few. One contributing factor that delayed appreciation of IDP functional significance is the experimental difficulty in characterizing their dynamic conformations. The combined application of multiple methods, termed integrative structural biology, has emerged as an essential approach to understanding IDP phenomena. Here, we review some of the recent applications of the integrative structural biology philosophy to study IDPs.
Collapse
Affiliation(s)
- Rachel Evans
- Department of Physics, North Carolina State University, Raleigh, NC 27695, USA
| | - Sravani Ramisetty
- Department of Medical Oncology and Therapeutics Research, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Prakash Kulkarni
- Department of Medical Oncology and Therapeutics Research, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Systems Biology, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Keith Weninger
- Department of Physics, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
50
|
Molteni C, Forni D, Cagliani R, Mozzi A, Clerici M, Sironi M. Evolution of the orthopoxvirus core genome. Virus Res 2023; 323:198975. [PMID: 36280003 PMCID: PMC9586335 DOI: 10.1016/j.virusres.2022.198975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022]
Abstract
Orthopoxviruses comprise several relevant pathogens, including the causative agent of smallpox and monkeypox virus. Analysis of orthopoxvirus genome evolution mainly focused on gene gains/losses. We instead analyzed core genes, which are conserved in all orthopoxviruses. We show that, despite their strong constraint, some genes involved in viral morphogenesis and transcription/replication were targets of pervasive positive selection, which was relatively uncommon in immunomodulatory genes. However at least three of the positively selected genes, E3L, A24R, and H3L, might have evolved in response to immune selection. Episodic positive selection was particularly common on the internal branches of the orthopox phylogeny and on the monkeypox virus lineage. The latter showed evidence of episodic positive selection at the D14L gene, which encodes a modulator of complement activation (MOPICE). Notably, two genes (B1R and A33R) targeted by episodic selection on more than one branch are involved in forms of intra-genomic conflict. Finally, we found that, in orthopoxvirus proteomes, intrinsically disordered regions (IDRs) tend to be less constrained and are common targets of positive selection. Extension of our analysis to all poxviruses showed no evidence that the IDR fraction differs with host range. Conversely, we found a strong effect of base composition, which was however not sufficient to explain IDR fraction. We thus suggest that, in poxviruses, the IDR fraction is maintained by modulating GC content to accommodate disorder-promoting codons. Overall, our data provide novel insight in orthopoxvirus evolution and provide a list of genes and sites that are expected to modulate viral phenotypes.
Collapse
Affiliation(s)
- Cristian Molteni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy.
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Mario Clerici
- University of Milan, Milan, Italy; Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|