1
|
Qin C, Wang YL, Zheng J, Wan XB, Fan XJ. Current perspectives in drug targeting intrinsically disordered proteins and biomolecular condensates. BMC Biol 2025; 23:118. [PMID: 40325419 PMCID: PMC12054275 DOI: 10.1186/s12915-025-02214-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 04/14/2025] [Indexed: 05/07/2025] Open
Abstract
Intrinsically disordered proteins (IDPs) and biomolecular condensates are critical for cellular processes and physiological functions. Abnormal biomolecular condensates can cause diseases such as cancer and neurodegenerative disorders. IDPs, including intrinsically disordered regions (IDRs), were previously considered undruggable due to their lack of stable binding pockets. However, recent evidence indicates that targeting them can influence cellular processes. This review explores current strategies to target IDPs and biomolecular condensates, potential improvements, and the challenges and opportunities in this evolving field.
Collapse
Affiliation(s)
- Caolitao Qin
- Department of Radiation Oncology, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China
- GuangDong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China
| | - Yun-Long Wang
- Department of Radiation Oncology, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China
- GuangDong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China
| | - Jian Zheng
- Department of Radiation Oncology, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China
- GuangDong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China
| | - Xiang-Bo Wan
- Department of Radiation Oncology, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China.
- Provincial Key Laboratory of Radiation Medicine in Henan, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, 450052, People's Republic of China.
| | - Xin-Juan Fan
- Department of Pathology, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China.
- GuangDong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, 510655, People's Republic of China.
| |
Collapse
|
2
|
Han KS, Kim HK, Kim MH, Pak MH, Pak SJ, Choe MM, Kim CS. PredIDR2: Improving accuracy of protein intrinsic disorder prediction by updating deep convolutional neural network and supplementing DisProt data. Int J Biol Macromol 2025; 306:141801. [PMID: 40054813 DOI: 10.1016/j.ijbiomac.2025.141801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 03/03/2025] [Accepted: 03/04/2025] [Indexed: 05/11/2025]
Abstract
Intrinsically disordered proteins (IDPs) or regions (IDRs) are widespread in proteomes, and involved in several important biological processes and implicated in many diseases. Many computational methods for IDR prediction are being developed to decrease the gap between the low speed of experimental determination of annotated proteins and the rapid increase of non-annotated proteins, and their performances are blindly tested by the community-driven experiment, the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we developed PredIDR2 series, an updated version of PredIDR tested in CAID2 in order to accurately predict intrinsically disordered regions from protein sequence. It includes four methods depending on the input features and the producing mode of the negative samples of the training set. PredIDR2 series (AUC_ROC = 0.952) perform remarkably better than our previous PredIDR (AUC_ROC = 0.933) for Disorder-PDB dataset of CAID2, which seems to be mainly attributed to the introduction of a new deep convolutional neural network and the augmentation of the training data, especially from DisProt database. PredIDR2 series outperform the state-of-the-art IDR prediction methods participated in CAID2 in terms of AUC_ROC, AUC_PR and DC_mae and belong to the seven top-performing methods in terms of MCC. PredIDR2 series can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
Collapse
Affiliation(s)
- Kun-Sop Han
- University of Sciences, Pyongyang, Democratic People's Republic of Korea.
| | - Ha-Kyong Kim
- Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyok Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyon Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Song-Jin Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Mun-Myong Choe
- University of Science and Technology, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Song Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| |
Collapse
|
3
|
Zheng S. Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins. PLoS One 2025; 20:e0313812. [PMID: 40131945 PMCID: PMC11936262 DOI: 10.1371/journal.pone.0313812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 02/18/2025] [Indexed: 03/27/2025] Open
Abstract
The study investigated regions with undefined structures, known as "missing" segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence and disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (PDB), residues were categorized as "modeled", "hard missing" and "soft missing" based on their visibility in structural datasets. Key features were determined, including a confidence score predicted local distance difference test (pLDDT) from AlphaFold2, an advanced structural prediction tool, and a disorder score from IUPred, a traditional disorder prediction method. To enhance prediction performance for unstructured residues, we employed a Long Short-Term Memory (LSTM) model, integrating both scores with amino acid sequences. Notable patterns such as composition, region lengths and prediction scores were observed in unstructured residues and regions identified through structural experiments over our studied period. Our findings also indicate that "hard missing" residues often align with low confidence scores, whereas "soft missing" residues exhibit dynamic behavior that can complicate predictions. The incorporation of pLDDT, IUPred scores, and sequence data into the LSTM model has improved the differentiation between structured and unstructured residues, particularly for shorter unstructured regions. This research elucidates the relationship between established computational predictions and experimental structural data, enhancing our ability to target structurally significant areas for research and guiding experimental designs toward functionally relevant regions.
Collapse
Affiliation(s)
- Sen Zheng
- Bio-Electron Microscopy Facility, iHuman Institution, ShanghaiTech University, Shanghai, China
| |
Collapse
|
4
|
Samulevich ML, Carman LE, Shamilov R, Aneskievich BJ. Conformational Analyses of the AHD1-UBAN Region of TNIP1 Highlight Key Amino Acids for Interaction with Ubiquitin. Biomolecules 2025; 15:453. [PMID: 40149990 PMCID: PMC11940065 DOI: 10.3390/biom15030453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 03/10/2025] [Accepted: 03/14/2025] [Indexed: 03/29/2025] Open
Abstract
Tumor necrosis factor ɑ (TNFɑ)-induced protein 3 (TNFAIP3)-interacting protein 1 (TNIP1) is genetically and functionally linked to limiting auto-immune and inflammatory responses. We have shown that TNIP1 (alias A20-binding inhibitor of NF-κB 1, ABIN1), functioning as a hub location to coordinate other proteins in repressing inflammatory signaling, aligns with biophysical traits indicative of its being an intrinsically disordered protein (IDP). IDPs move through a repertoire of three-dimensional structures rather than being in one set conformation. Here we employed bioinformatic analysis and biophysical interventions via amino acid mutations to assess and alter, respectively, conformational flexibility along a crucial region of TNIP1, encompassing the ABIN homology domain 1 and ubiquitin-binding domain in ABIN proteins and NEMO (AHD1-UBAN), by purposeful replacement of key residues. In vitro secondary structure measurements were mostly in line with, but not necessarily to the same degree as, expected results from in silico assessments. Notably, changes in single amino acids outside of the ubiquitin-binding region for gain-of-order effects had consequences along the length of the AHD1-UBAN propagating to that region. This is evidenced by differences in recognition of the partner protein polyubiquitin ≥ 28 residues away, depending on the mutation site, from the previously identified key binding site. These findings serve to demonstrate the role of conformational flexibility in protein partner recognition by TNIP1, thus identifying key amino acids likely to impact the molecular dynamics involved in TNIP1 repression of inflammatory signaling at large.
Collapse
Affiliation(s)
- Michael L. Samulevich
- Graduate Program in Pharmacology & Toxicology, University of Connecticut, Storrs, CT 06269-3092, USA; (M.L.S.); (L.E.C.)
| | - Liam E. Carman
- Graduate Program in Pharmacology & Toxicology, University of Connecticut, Storrs, CT 06269-3092, USA; (M.L.S.); (L.E.C.)
| | - Rambon Shamilov
- Graduate Program in Pharmacology & Toxicology, University of Connecticut, Storrs, CT 06269-3092, USA; (M.L.S.); (L.E.C.)
| | - Brian J. Aneskievich
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Connecticut, Storrs, CT 06269-3092, USA
| |
Collapse
|
5
|
Erdős G, Deutsch N, Dosztányi Z. AIUPred - Binding: Energy Embedding to Identify Disordered Binding Regions. J Mol Biol 2025:169071. [PMID: 40133781 DOI: 10.1016/j.jmb.2025.169071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/25/2025] [Accepted: 03/03/2025] [Indexed: 03/27/2025]
Abstract
Intrinsically disordered regions (IDRs) play critical roles in various cellular processes, often mediating interactions through disordered binding regions that transition to ordered states. Experimental characterization of these functional regions is highly challenging, underscoring the need for fast and accurate computational tools. Despite their importance, predicting disordered binding regions remains a significant challenge due to limitations in existing datasets and methodologies. In this study, we introduce AIUPred-binding, a novel prediction tool leveraging a high dimensional mathematical representation of structural energies - we call energy embedding - and pathogenicity scores from AlphaMissense. By employing a transfer learning approach, AIUPred-binding demonstrates improved accuracy in identifying functional sites within IDRs. Our results highlight the tool's ability to discern subtle features within disordered regions, addressing biases and other challenges associated with manually curated datasets. We present AIUPred-binding integrated into the AIUPred web framework as a versatile and efficient resource for understanding the functional roles of IDRs. AIUPred-binding is freely accessible at https://aiupred.elte.hu.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| | - Norbert Deutsch
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| |
Collapse
|
6
|
Chaurasiya D, Mondal R, Lahiri T, Tripathi A, Ghinmine T. IDPpred: a new sequence-based predictor for identification of intrinsically disordered protein with enhanced accuracy. J Biomol Struct Dyn 2025; 43:957-965. [PMID: 38079339 DOI: 10.1080/07391102.2023.2290615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/15/2023] [Indexed: 01/01/2025]
Abstract
Discovery of intrinsically disordered proteins (IDPs) and protein hybrids that contain both intrinsically disordered protein regions (IDPRs) along with ordered regions has changed the sequence-structure-function paradigm of protein. These proteins with lack of persistently fixed structure are often found in all organisms and play vital roles in various biological processes. Some of them are considered as potential drug targets due to their overrepresentation in pathophysiological processes. The major bottlenecks for characterizing such proteins are their occasional overexpression, difficulty in getting purified homogeneous form and the challenge of investigating them experimentally. Sequence-based prediction of intrinsic disorder remains a useful strategy especially for many large-scale proteomic investigations. However, worst accuracy still occurs for short disordered regions with less than ten residues, for the residues close to order-disorder boundaries, for regions that undergo coupled folding and binding in presence of partner, and for prediction of fully disordered proteins. Annotation of fully disordered proteins mostly relies on the far-UV circular dichroism experiment which gives overall secondary structure composition without residue-level resolution. Current methods including that using secondary structure information failed to predict half of target IDPs correctly in the recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment. This study utilized profiles of random sequential appearance of physicochemical properties of amino acids and random sequential appearance of order and disorder promoting amino acids in protein together with the existing CIDER feature for the prediction of IDP from sequence input. Our method was found to significantly outperform the existing predictors across different datasets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Deepak Chaurasiya
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| | - Rajkrishna Mondal
- Department of Biotechnology, Nagaland University, Dimapur, Nagaland, India
| | - Tapobrata Lahiri
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| | - Asmita Tripathi
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| | - Tejas Ghinmine
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| |
Collapse
|
7
|
Han KS, Song SR, Pak MH, Kim CS, Ri CP, Del Conte A, Piovesan D. PredIDR: Accurate prediction of protein intrinsic disorder regions using deep convolutional neural network. Int J Biol Macromol 2025; 284:137665. [PMID: 39571839 DOI: 10.1016/j.ijbiomac.2024.137665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/29/2024] [Accepted: 11/13/2024] [Indexed: 12/02/2024]
Abstract
The involvement of protein intrinsic disorder in essential biological processes, it is well known in structural biology. However, experimental methods for detecting intrinsic structural disorder and directly measuring highly dynamic behavior of protein structure are limited. To address this issue, several computational methods to predict intrinsic disorder from protein sequences were developed and their performance is evaluated by the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we describe a new computational method, PredIDR, which provides accurate prediction of intrinsically disordered regions in proteins, mimicking experimental X-ray missing residues. Indeed, missing residues in Protein Data Bank (PDB) were used as positive examples to train a deep convolutional neural network which produces two types of output for short and long regions. PredIDR took part in the second round of CAID and was as accurate as the top state-of-the-art IDR prediction methods. PredIDR can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
Collapse
Affiliation(s)
- Kun-Sop Han
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Se-Ryong Song
- Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyon Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Song Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Pyok Ri
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
8
|
Song J, Kurgan L. Two decades of advances in sequence-based prediction of MoRFs, disorder-to-order transitioning binding regions. Expert Rev Proteomics 2025; 22:1-9. [PMID: 39789785 DOI: 10.1080/14789450.2025.2451715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/20/2024] [Accepted: 12/26/2024] [Indexed: 01/12/2025]
Abstract
INTRODUCTION Molecular recognition features (MoRFs) are regions in protein sequences that undergo induced folding upon binding partner molecules. MoRFs are common in nature and can be predicted from sequences based on their distinctive sequence signatures. AREAS COVERED We overview 20 years of progress in the sequence-based prediction of MoRFs which resulted in the development of 25 predictors of MoRFs that interact with proteins, peptides, and lipids. These methods range from simple discriminant analysis to sophisticated deep transformer networks that use protein language models. They generate relatively accurate predictions as evidenced by the results of a recently published community-driven assessment. EXPERT OPINION MoRFs prediction is a mature field of research that is poised to continue at a steady pace in the foreseeable future. We anticipate further expansion of the scope of MoRF predictions to additional partner molecules, such as nucleic acids, and continued use of recent machine learning advances. Other future efforts should concentrate on improving availability of MoRF predictions by releasing, maintaining, and popularizing web servers and by depositing MoRF predictions to large databases of protein structure and function predictions. Furthermore, accurate MoRF predictions should be coupled with the equally accurate prediction and modeling of the resulting structures of complexes.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
9
|
Zhao B, Basu S, Kurgan L. DescribePROT Database of Residue-Level Protein Structure and Function Annotations. Methods Mol Biol 2025; 2867:169-184. [PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
DescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data. Moreover, we briefly discuss plans for a future expansion of this database. DescribePROT is available at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ .
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
10
|
Wang K, Hu G, Wu Z, Kurgan L. Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn. Methods Mol Biol 2025; 2867:201-218. [PMID: 39576583 DOI: 10.1007/978-1-0716-4196-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
11
|
Zhang F, Kurgan L. Evaluation of predictions of disordered binding regions in the CAID2 experiment. Comput Struct Biotechnol J 2024; 27:78-88. [PMID: 39811792 PMCID: PMC11732247 DOI: 10.1016/j.csbj.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 12/12/2024] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open
Abstract
A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2's evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.
Collapse
Affiliation(s)
- Fuhao Zhang
- College of Information Engineering, Northwest A & F University, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
12
|
Erdős G, Dosztányi Z. Deep learning for intrinsically disordered proteins: From improved predictions to deciphering conformational ensembles. Curr Opin Struct Biol 2024; 89:102950. [PMID: 39522439 DOI: 10.1016/j.sbi.2024.102950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/19/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024]
Abstract
Intrinsically disordered proteins (IDPs) lack a stable three-dimensional structure under physiological conditions, challenging traditional structure-based prediction methods. This review explores how modern deep learning approaches, which have revolutionized structure prediction for globular proteins, have impacted protein disorder predictions. We highlight the role of community-driven efforts in curating data and assessing state-of-the-art, which have been crucial in advancing the field. We also review state-of-the-art methods utilizing deep learning techniques, highlighting innovative approaches. We also address advancements in characterizing protein conformational ensembles directly from sequence data using novel machine learning methods.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| |
Collapse
|
13
|
Wang K, Hu G, Basu S, Kurgan L. flDPnn2: Accurate and Fast Predictor of Intrinsic Disorder in Proteins. J Mol Biol 2024; 436:168605. [PMID: 39237195 DOI: 10.1016/j.jmb.2024.168605] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/16/2024] [Accepted: 05/04/2024] [Indexed: 09/07/2024]
Abstract
Prediction of the intrinsic disorder in protein sequences is an active research area, with well over 100 predictors that were released to date. These efforts are motivated by the functional importance and high levels of abundance of intrinsic disorder, combined with relatively low amounts of experimental annotations. The disorder predictors are periodically evaluated by independent assessors in the Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiments. The recently completed CAID2 experiment assessed close to 40 state-of-the-art methods demonstrating that some of them produce accurate results. In particular, flDPnn2 method, which is the successor of flDPnn that performed well in the CAID1 experiment, secured the overall most accurate results on the Disorder-NOX dataset in CAID2. flDPnn2 implements a number of improvements when compared to its predecessor including changes to the inputs, increased size of the deep network model that we retrained on a larger training set, and addition of an alignment module. Using results from CAID2, we show that flDPnn2 produces accurate predictions very quickly, modestly improving over the accuracy of flDPnn and reducing the runtime by half, to about 27 s per protein. flDPnn2 is freely available as a convenient web server at http://biomine.cs.vcu.edu/servers/flDPnn2/.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
14
|
Mouland AJ, Chau BA, Uversky VN. Methodological approaches to studying phase separation and HIV-1 replication: Current and future perspectives. Methods 2024; 229:147-155. [PMID: 39002735 DOI: 10.1016/j.ymeth.2024.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 06/26/2024] [Accepted: 07/11/2024] [Indexed: 07/15/2024] Open
Abstract
This article reviews tried-and-tested methodologies that have been employed in the first studies on phase separating properties of structural, RNA-binding and catalytic proteins of HIV-1. These are described here to stimulate interest for any who may want to initiate similar studies on virus-mediated liquid-liquid phase separation. Such studies serve to better understand the life cycle and pathogenesis of viruses and open the door to new therapeutics.
Collapse
Affiliation(s)
- Andrew J Mouland
- Department of Medicine, McGill University, Montreal, Quebec, Canada.
| | - Bao-An Chau
- Department of Medicine, McGill University, Montreal, Quebec, Canada
| | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| |
Collapse
|
15
|
Roden CA, Gladfelter AS. Experimental Considerations for the Evaluation of Viral Biomolecular Condensates. Annu Rev Virol 2024; 11:105-124. [PMID: 39326881 DOI: 10.1146/annurev-virology-093022-010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2024]
Abstract
Biomolecular condensates are nonmembrane-bound assemblies of biological polymers such as protein and nucleic acids. An increasingly accepted paradigm across the viral tree of life is (a) that viruses form biomolecular condensates and (b) that the formation is required for the virus. Condensates can promote viral replication by promoting packaging, genome compaction, membrane bending, and co-opting of host translation. This review is primarily concerned with exploring methodologies for assessing virally encoded biomolecular condensates. The goal of this review is to provide an experimental framework for virologists to consider when designing experiments to (a) identify viral condensates and their components, (b) reconstitute condensation cell free from minimal components, (c) ask questions about what conditions lead to condensation, (d) map these questions back to the viral life cycle, and (e) design and test inhibitors/modulators of condensation as potential therapeutics. This experimental framework attempts to integrate virology, cell biology, and biochemistry approaches.
Collapse
Affiliation(s)
- Christine A Roden
- Department of Cell Biology, Duke University School of Medicine, Durham, North Carolina, USA;
| | - Amy S Gladfelter
- Department of Cell Biology, Duke University School of Medicine, Durham, North Carolina, USA;
| |
Collapse
|
16
|
Peck Y, Pickering D, Mobli M, Liddell MJ, Wilson DT, Ruscher R, Ryan S, Buitrago G, McHugh C, Love NC, Pinlac T, Haertlein M, Kron MA, Loukas A, Daly NL. Solution structure of the N-terminal extension domain of a Schistosoma japonicum asparaginyl-tRNA synthetase. J Biomol Struct Dyn 2024; 42:7934-7944. [PMID: 37572327 DOI: 10.1080/07391102.2023.2241918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/24/2023] [Indexed: 08/14/2023]
Abstract
Several secreted proteins from helminths (parasitic worms) have been shown to have immunomodulatory activities. Asparaginyl-tRNA synthetases are abundantly secreted in the filarial nematode Brugia malayi (BmAsnRS) and the parasitic flatworm Schistosoma japonicum (SjAsnRS), indicating a possible immune function. The suggestion is supported by BmAsnRS alleviating disease symptoms in a T-cell transfer mouse model of colitis. This immunomodulatory function is potentially related to an N-terminal extension domain present in eukaryotic AsnRS proteins but few structure/function studies have been done on this domain. Here we have determined the three-dimensional solution structure of the N-terminal extension domain of SjAsnRS. A protein containing the 114 N-terminal amino acids of SjAsnRS was recombinantly expressed with isotopic labelling to allow structure determination using 3D NMR spectroscopy, and analysis of dynamics using NMR relaxation experiments. Structural comparisons of the N-terminal extension domain of SjAsnRS with filarial and human homologues highlight a high degree of variability in the β-hairpin region of these eukaryotic N-AsnRS proteins, but similarities in the disorder of the C-terminal regions. Limitations in PrDOS-based intrinsically disordered region (IDR) model predictions were also evident in this comparison. Empirical structural data such as that presented in our study for N-SjAsnRS will enhance the prediction of sequence-homology based structure modelling and prediction of IDRs in the future.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Yoshimi Peck
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Darren Pickering
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Mehdi Mobli
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, Australia
| | - Michael J Liddell
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
| | - David T Wilson
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Roland Ruscher
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Stephanie Ryan
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Geraldine Buitrago
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Connor McHugh
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | | | - Theresa Pinlac
- Department of Biochemistry, University of the Philippines, Manila, Philippines
| | | | - Michael A Kron
- Department of Medicine, Division of Infectious Diseases, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Alex Loukas
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Norelle L Daly
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| |
Collapse
|
17
|
Khan RU, Kumar R, Haq AU, Khan I, Shabaz M, Khan F. Blockchain-Based Trusted Tracking Smart Sensing Network to Prevent the Spread of Infectious Diseases. Ing Rech Biomed 2024; 45:100829. [DOI: 10.1016/j.irbm.2024.100829] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2024]
|
18
|
Song J, Kurgan L. Availability of web servers significantly boosts citations rates of bioinformatics methods for protein function and disorder prediction. BIOINFORMATICS ADVANCES 2023; 3:vbad184. [PMID: 38146538 PMCID: PMC10749743 DOI: 10.1093/bioadv/vbad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/08/2023] [Accepted: 12/15/2023] [Indexed: 12/27/2023]
Abstract
Motivation Development of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations. Results We show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Clayton, VIC 3800, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
19
|
Basu S, Hegedűs T, Kurgan L. CoMemMoRFPred: Sequence-based Prediction of MemMoRFs by Combining Predictors of Intrinsic Disorder, MoRFs and Disordered Lipid-binding Regions. J Mol Biol 2023; 435:168272. [PMID: 37709009 DOI: 10.1016/j.jmb.2023.168272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/01/2023] [Accepted: 09/07/2023] [Indexed: 09/16/2023]
Abstract
Molecular recognition features (MoRFs) are a commonly occurring type of intrinsically disordered regions (IDRs) that undergo disorder-to-order transition upon binding to partner molecules. We focus on recently characterized and functionally important membrane-binding MoRFs (MemMoRFs). Motivated by the lack of computational tools that predict MemMoRFs, we use a dataset of experimentally annotated MemMoRFs to conceptualize, design, evaluate and release an accurate sequence-based predictor. We rely on state-of-the-art tools that predict residues that possess key characteristics of MemMoRFs, such as intrinsic disorder, disorder-to-order transition and lipid-binding. We identify and combine results from three tools that include flDPnn for the disorder prediction, DisoLipPred for the prediction of disordered lipid-binding regions, and MoRFCHiBiLight for the prediction of disorder-to-order transitioning protein binding regions. Our empirical analysis demonstrates that combining results produced by these three methods generates accurate predictions of MemMoRFs. We also show that use of a smoothing operator produces predictions that closely mimic the number and sizes of the native MemMoRF regions. The resulting CoMemMoRFPred method is available as an easy-to-use webserver at http://biomine.cs.vcu.edu/servers/CoMemMoRFPred. This tool will aid future studies of MemMoRFs in the context of exploring their abundance, cellular functions, and roles in pathologic phenomena.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Tamás Hegedűs
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary; ELKH-SE Biophysical Virology Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA.
| |
Collapse
|
20
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
21
|
Wilson C, Lewis KA, Fitzkee NC, Hough LE, Whitten ST. ParSe 2.0: A web tool to identify drivers of protein phase separation at the proteome level. Protein Sci 2023; 32:e4756. [PMID: 37574757 PMCID: PMC10464302 DOI: 10.1002/pro.4756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/15/2023]
Abstract
We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase-separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain-level organization and compute a sequence-based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visiting https://stevewhitten.github.io/Parse_v2_FASTA to quickly identify phase-separating proteins within large sequence sets, or by visiting https://stevewhitten.github.io/Parse_v2_web to evaluate individual protein sequences.
Collapse
Affiliation(s)
- Colorado Wilson
- Department of Chemistry and BiochemistryTexas State UniversitySan MarcosTexasUSA
- Present address:
Department of Pharmacology and Toxicology, Sealy Center for Structural Biology and Molecular BiophysicsUniversity of Texas Medical BranchGalvestonTexasUSA
| | - Karen A. Lewis
- Department of Chemistry and BiochemistryTexas State UniversitySan MarcosTexasUSA
| | - Nicholas C. Fitzkee
- Department of ChemistryMississippi State UniversityMississippi StateMississippiUSA
| | - Loren E. Hough
- Department of PhysicsUniversity of Colorado BoulderBoulderColoradoUSA
- BioFrontiers InstituteUniversity of Colorado BoulderBoulderColoradoUSA
| | - Steven T. Whitten
- Department of Chemistry and BiochemistryTexas State UniversitySan MarcosTexasUSA
| |
Collapse
|
22
|
Zhao B, Ghadermarzi S, Kurgan L. Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins. Comput Struct Biotechnol J 2023; 21:3248-3258. [PMID: 38213902 PMCID: PMC10782001 DOI: 10.1016/j.csbj.2023.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/01/2023] [Indexed: 01/13/2024] Open
Abstract
We expand studies of AlphaFold2 (AF2) in the context of intrinsic disorder prediction by comparing it against a broad selection of 20 accurate, popular and recently released disorder predictors. We use 25% larger benchmark dataset with 646 proteins and cover protein-level predictions of disorder content and fully disordered proteins. AF2-based disorder predictions secure a relatively high Area Under receiver operating characteristic Curve (AUC) of 0.77 and are statistically outperformed by several modern disorder predictors that secure AUCs around 0.8 with median runtime of about 20 s compared to 1200 s for AF2. Moreover, AF2 provides modestly accurate predictions of fully disordered proteins (F1 = 0.59 vs. 0.91 for the best disorder predictor) and disorder content (mean absolute error of 0.21 vs. 0.15). AF2 also generates statistically more accurate disorder predictions for about 20% of proteins that have relatively short sequences and a few disordered regions that tend to be located at the sequence termini, and which are absent of disordered protein-binding regions. Interestingly, AF2 and the most accurate disorder predictors rely on deep neural networks, suggesting that these models are useful for protein structure and disorder predictions.
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
23
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
24
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
25
|
Peng Z, Li Z, Meng Q, Zhao B, Kurgan L. CLIP: accurate prediction of disordered linear interacting peptides from protein sequences using co-evolutionary information. Brief Bioinform 2023; 24:6858950. [PMID: 36458437 DOI: 10.1093/bib/bbac502] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 09/30/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,Frontier Science Center for Nonlinear Expectations, Ministry of Education, Qingdao, 266237, China
| | - Zixia Li
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Qiaozhen Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
26
|
Sun B, Kekenes-Huskey PM. Myofilament-associated proteins with intrinsic disorder (MAPIDs) and their resolution by computational modeling. Q Rev Biophys 2023; 56:e2. [PMID: 36628457 PMCID: PMC11070111 DOI: 10.1017/s003358352300001x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The cardiac sarcomere is a cellular structure in the heart that enables muscle cells to contract. Dozens of proteins belong to the cardiac sarcomere, which work in tandem to generate force and adapt to demands on cardiac output. Intriguingly, the majority of these proteins have significant intrinsic disorder that contributes to their functions, yet the biophysics of these intrinsically disordered regions (IDRs) have been characterized in limited detail. In this review, we first enumerate these myofilament-associated proteins with intrinsic disorder (MAPIDs) and recent biophysical studies to characterize their IDRs. We secondly summarize the biophysics governing IDR properties and the state-of-the-art in computational tools toward MAPID identification and characterization of their conformation ensembles. We conclude with an overview of future computational approaches toward broadening the understanding of intrinsic disorder in the cardiac sarcomere.
Collapse
Affiliation(s)
- Bin Sun
- Research Center for Pharmacoinformatics (The State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Department of Medicinal Chemistry and Natural Medicine Chemistry, College of Pharmacy, Harbin Medical University, Harbin 150081, China
| | | |
Collapse
|
27
|
Artificial Neural Networks for the Prediction of Monkeypox Outbreak. Trop Med Infect Dis 2022; 7:tropicalmed7120424. [PMID: 36548679 PMCID: PMC9783768 DOI: 10.3390/tropicalmed7120424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 11/23/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
While the world is still struggling to recover from the harm caused by the widespread COVID-19 pandemic, the monkeypox virus now poses a new threat of becoming a pandemic. Although it is not as dangerous or infectious as COVID-19, new cases of the disease are nevertheless being reported daily from many countries. In this study, we have used public datasets provided by the European Centre for Disease Prevention and Control for developing a prediction model for the spread of the monkeypox outbreak to and throughout the USA, Germany, the UK, France and Canada. We have used certain effective neural network models for this purpose. The novelty of this study is that a neural network model for a time series monkeypox dataset is developed and compared with LSTM and GRU models using an adaptive moment estimation (ADAM) optimizer. The Levenberg-Marquardt (LM) learning technique is used to develop and validate a single hidden layer artificial neural network (ANN) model. Different ANN model architectures with varying numbers of hidden layer neurons were trained, and the K-fold cross-validation early stopping validation approach was employed to identify the optimum structure with the best generalization potential. In the regression analysis, our ANN model gives a good R-value of almost 99%, the LSTM model gives almost 98% and the GRU model gives almost 98%. These three model fits demonstrated that there was a good agreement between the experimental data and the forecasted values. The results of our experiments show that the ANN model performs better than the other methods on the collected monkeypox dataset in all five countries. To the best of the authors' knowledge, this is the first report that has used ANN, LSTM and GRU to predict a monkeypox outbreak in all five countries.
Collapse
|
28
|
Intrinsically Disordered Proteins: An Overview. Int J Mol Sci 2022; 23:ijms232214050. [PMID: 36430530 PMCID: PMC9693201 DOI: 10.3390/ijms232214050] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Many proteins and protein segments cannot attain a single stable three-dimensional structure under physiological conditions; instead, they adopt multiple interconverting conformational states. Such intrinsically disordered proteins or protein segments are highly abundant across proteomes, and are involved in various effector functions. This review focuses on different aspects of disordered proteins and disordered protein regions, which form the basis of the so-called "Disorder-function paradigm" of proteins. Additionally, various experimental approaches and computational tools used for characterizing disordered regions in proteins are discussed. Finally, the role of disordered proteins in diseases and their utility as potential drug targets are explored.
Collapse
|
29
|
He J, Turzo SBA, Seffernick JT, Kim SS, Lindert S. Prediction of Intrinsic Disorder Using Rosetta ResidueDisorder and AlphaFold2. J Phys Chem B 2022; 126:8439-8446. [PMID: 36251522 DOI: 10.1021/acs.jpcb.2c05508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The combination of deep learning and sequence data has transformed protein structure prediction and modeling, evidenced in the success of AlphaFold (AF). For this reason, many methods have been developed to take advantage of this success in areas where inaccurate structural modeling may limit computational predictiveness. For example, many methods have been developed to predict protein intrinsic disorder from sequence, including our Rosetta ResidueDisorder (RRD) approach. Intrinsically disordered regions in proteins are parts of the sequence that do not form ordered, folded structures under typical physiological conditions. In the original implementation of RRD, Rosetta ab initio models were generated, and disordered regions were predicted based on residue scores (disordered residues typically exist in regions of unfavorable scores). In this work, we show that by (i) replacing the ab initio modeling with AF (using the same scoring and disorder assignment approach) and (ii) updating the score function, the predictiveness improved significantly. Residues were better ranked by the order/disorder, evidenced by an improvement in receiver operating characteristic area-under-the-curve from 0.69 to 0.78 on a large (229 protein) and balanced data set (relatively even ordered versus disordered residues). Finally, the binary prediction accuracy also improved from 62% to 74% on the same data set. Our results show that the combined AF-RRD approach was as good as or better than all existing methods by these metrics (AF-RRD had the highest prediction accuracy).
Collapse
Affiliation(s)
- Jiadi He
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Sm Bargeen Alam Turzo
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Stephanie S Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
30
|
Polanco C, Uversky VN, Huberman A, Vargas-Alarcón G, Castañón González JA, Buhse T, Hernández Lemus E, Rios Castro M, López Oliva EJ, Solís Nájera SE. Bioinformatics-based Characterization of the Sequence Variability of
Zika Virus Polyprotein and Envelope Protein (E). Evol Bioinform Online 2022; 18:11769343221130730. [PMCID: PMC9623037 DOI: 10.1177/11769343221130730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 09/12/2022] [Indexed: 11/17/2022] Open
Abstract
Background: Zika virus, which is widely spread and infects humans through the bites of
Aedes albopictus and Aedes aegypti
female mosquitoes, represents a serious global health issue. Objective: The objective of the present study is to computationally characterize Zika
virus polyproteins (UniProt Name: PRO_0000443018 [residues 1-3423],
PRO_0000445659 [residues 1-3423] and PRO_0000435828 [residues 1-3419]) and
their envelope proteins using their physico-chemical properties. Methods: To achieve this, the Polarity Index Method (PIM) profile and the Protein
Intrinsic Disorder Predisposition (PIDP) profile of 3 main groups of
proteins were evaluated: structural proteins extracted from specific
Databases, Zika virus polyproteins, and their envelope proteins (E)
extracted from UniProt Database. Once the PIM profile of the Zika virus
envelope proteins (E) was obtained and since the Zika virus polyproteins
were also identified with this profile, the proteins defined as “reviewed
proteins” extracted from the UniProt Database were searched
for the similar PIM profile. Finally, the difference between the PIM
profiles of the Zika virus polyproteins and their envelope proteins (E) was
tested using 2 non-parametric statistical tests. Results: It was found and tested that the PIM profile is an efficient discriminant
that allows obtaining a “computational fingerprint” of each Zika virus
polyprotein from its envelope protein (E). Conclusion: PIM profile represents a computational tool, which can be used to effectively
discover Zika virus polyproteins from Databases, from their envelope
proteins (E) sequences.
Collapse
Affiliation(s)
- Carlos Polanco
- Department of Electromechanical
Instrumentation, Instituto Nacional de Cardiología “Ignacio Chávez,” México City,
México,Department of Mathematics, Faculty of
Sciences, Universidad Nacional Autónoma de México, México City, México,Carlos Polanco, Department of
Electromechanical Instrumentation, Instituto Nacional de Cardiología “Ignacio
Chávez,” Juan Badiano 1 Tlalpan, México City 14800, México.
| | - Vladimir N Uversky
- Department of Molecular Medicine and
USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine,
University of South Florida, Tampa, FL, USA,Protein Research Group, Institute for
Biological Instrumentation of the Russian Academy of Sciences, Federal Research
Center “Pushchino Scientific Center for Biological Research of the Russian Academy
of Sciences,” Pushchino, Moscow Region, Russia
| | - Alberto Huberman
- Department of Biochemistry, Instituto
Nacional de Ciencias Médicas y Nutrición “Salvador Zubirán”, México City,
México
| | | | | | - Thomas Buhse
- Chemical Research Center, Universidad
Autónoma del Estado de Morelos, Cuernavaca, Morelos, México
| | - Enrique Hernández Lemus
- Department of Computational Genomics,
Instituto Nacional de Medicina Genómica, México City, México
| | - Martha Rios Castro
- Department of Electromechanical
Instrumentation, Instituto Nacional de Cardiología “Ignacio Chávez,” México City,
México
| | - Erika Jeannette López Oliva
- Department of Electromechanical
Instrumentation, Instituto Nacional de Cardiología “Ignacio Chávez,” México City,
México
| | | |
Collapse
|
31
|
Fang Y, Yang Y, Liu C. New feature extraction from phylogenetic profiles improved the performance of pathogen-host interactions. Front Cell Infect Microbiol 2022; 12:931072. [PMID: 35982784 PMCID: PMC9378789 DOI: 10.3389/fcimb.2022.931072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/11/2022] [Indexed: 11/13/2022] Open
Abstract
MotivationThe understanding of pathogen-host interactions (PHIs) is essential and challenging research because this potentially provides the mechanism of molecular interactions between different organisms. The experimental exploration of PHI is time-consuming and labor-intensive, and computational approaches are playing a crucial role in discovering new unknown PHIs between different organisms. Although it has been proposed that most machine learning (ML)–based methods predict PHI, these methods are all based on the structure-based information extracted from the sequence for prediction. The selection of feature values is critical to improving the performance of predicting PHI using ML.ResultsThis work proposed a new method to extract features from phylogenetic profiles as evolutionary information for predicting PHI. The performance of our approach is better than that of structure-based and ML-based PHI prediction methods. The five different extract models proposed by our approach combined with structure-based information significantly improved the performance of PHI, suggesting that combining phylogenetic profile features and structure-based methods could be applied to the exploration of PHI and discover new unknown biological relativity.Availability and implementationThe KPP method is implemented in the Java language and is available at https://github.com/yangfangs/KPP.
Collapse
Affiliation(s)
- Yang Fang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
- Department of Laboratory Medicine, Third Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yi Yang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
- *Correspondence: Chengcheng Liu, ; Yi Yang,
| | - Chengcheng Liu
- State Key Laboratory of Oral Diseases, Department of Periodontics, National Clinical Research Center for Oral Diseases, West China School & Hospital of Stomatology, Sichuan University, Chengdu, China
- *Correspondence: Chengcheng Liu, ; Yi Yang,
| |
Collapse
|
32
|
Bezerra RP, Conniff AS, Uversky VN. Comparative study of structures and functional motifs in lectins from the commercially important photosynthetic microorganisms. Biochimie 2022; 201:63-74. [PMID: 35839918 DOI: 10.1016/j.biochi.2022.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/17/2022] [Accepted: 07/08/2022] [Indexed: 11/26/2022]
Abstract
Photosynthetic microorganisms, specifically cyanobacteria and microalgae, can synthesize a vast array of biologically active molecules, such as lectins, that have great potential for various biotechnological and biomedical applications. However, since the structures of these proteins are not well established, likely due to the presence of intrinsically disordered regions, our ability to better understand their functionality is hampered. We embarked on a study of the carbohydrate recognition domain (CRD), intrinsically disordered regions (IDRs), amino acidic composition, as well as and functional motifs in lectins from cyanobacteria of the genus Arthrospira and microalgae Chlorella and Dunaliella genus using a combination of bioinformatics techniques. This search revealed the presence of five distinctive CRD types differently distributed between the genera. Most CRDs displayed a group-specific distribution, except to C. sorokiniana possessing distinctive CRD probably due to its specific lifestyle. We also found that all CRDs contain short IDRs. Bacterial lectin of Arthrospira prokarionte showed lower intrinsic disorder and proline content when compared to the lectins from the eukaryotic microalgae (Chlorella and Dunaliella). Among the important functions predicted in all lectins were several specific motifs, which directly interacts with proteins involved in the cell-cycle control and which may be used for pharmaceutical purposes. Since the aforementioned properties of each type of lectin were investigated in silico, they need experimental confirmation. The results of our study provide an overview of the distribution of CRD, IDRs, and functional motifs within lectin from the commercially important microalgae.
Collapse
Affiliation(s)
- Raquel P Bezerra
- Department of Morphology and Animal Physiology, Federal Rural University of Pernambuco-UFRPE, Dom Manoel de Medeiros Ave, Recife, PE, 52171-900, Brazil.
| | - Amanda S Conniff
- Department of Medical Engineering, Morsani College of Medicine and College of Engineering, University of South Florida, Tampa, FL, 33612, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.
| |
Collapse
|
33
|
Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules 2022; 12:biom12070888. [PMID: 35883444 PMCID: PMC9313023 DOI: 10.3390/biom12070888] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/10/2022] [Accepted: 06/10/2022] [Indexed: 11/17/2022] Open
Abstract
Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes.
Collapse
|
34
|
Binder JL, Berendzen J, Stevens AO, He Y, Wang J, Dokholyan NV, Oprea TI. AlphaFold illuminates half of the dark human proteins. Curr Opin Struct Biol 2022; 74:102372. [PMID: 35439658 PMCID: PMC10669925 DOI: 10.1016/j.sbi.2022.102372] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 03/02/2022] [Accepted: 03/13/2022] [Indexed: 01/05/2023]
Abstract
We investigate the use of confidence scores to evaluate the accuracy of a given AlphaFold (AF2) protein model for drug discovery. Prediction of accuracy is improved by not considering confidence scores below 80 due to the effects of disorder. On a set of recent crystal structures, 95% are likely to have accurate folds. Conformational discordance in the training set has a much more significant effect on accuracy than sequence divergence. We propose criteria for models and residues that are possibly useful for virtual screening. Based on these criteria, AF2 provides models for half of understudied (dark) human proteins and two-thirds of residues in those models.
Collapse
Affiliation(s)
- Jessica L Binder
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA. https://twitter.com/@jessicamaine
| | - Joel Berendzen
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA
| | - Amy O Stevens
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Yi He
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA; Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Jian Wang
- Department of Pharmacology, Department of Biochemistry and Molecular Biology, Penn State University College of Medicine, Hershey, PA 17033, USA
| | - Nikolay V Dokholyan
- Department of Pharmacology, Department of Biochemistry and Molecular Biology, Penn State University College of Medicine, Hershey, PA 17033, USA; Department of Chemistry and Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, United States
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA; UNM Comprehensive Cancer Center, Albuquerque, NM, USA; Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at the University of Gothenburg, Gothenburg, Sweden; Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
35
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
36
|
Predicting protein intrinsically disordered regions by applying natural language processing practices. Soft comput 2022. [DOI: 10.1007/s00500-022-07085-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
37
|
Orlando G, Raimondi D, Codice F, Tabaro F, Vranken W. Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. J Mol Biol 2022; 434:167579. [DOI: 10.1016/j.jmb.2022.167579] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/21/2022] [Accepted: 03/31/2022] [Indexed: 10/18/2022]
|
38
|
Hassan SS, Choudhury PP, Dayhoff GW, Aljabali AAA, Uhal BD, Lundstrom K, Rezaei N, Pizzol D, Adadi P, Lal A, Soares A, Mohamed Abd El-Aziz T, Brufsky AM, Azad GK, Sherchan SP, Baetas-da-Cruz W, Takayama K, Serrano-Aroca Ã, Chauhan G, Palu G, Mishra YK, Barh D, Santana Silva RJ, Andrade BS, Azevedo V, Góes-Neto A, Bazan NG, Redwan EM, Tambuwala M, Uversky VN. The importance of accessory protein variants in the pathogenicity of SARS-CoV-2. Arch Biochem Biophys 2022; 717:109124. [PMID: 35085577 PMCID: PMC8785432 DOI: 10.1016/j.abb.2022.109124] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 01/15/2022] [Accepted: 01/17/2022] [Indexed: 01/16/2023]
Abstract
The coronavirus disease 2019 (COVID-19) is caused by the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS- CoV-2) with an estimated fatality rate of less than 1%. The SARS-CoV-2 accessory proteins ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 possess putative functions to manipulate host immune mechanisms. These involve interferons, which appear as a consensus function, immune signaling receptor NLRP3 (NLR family pyrin domain-containing 3) inflammasome, and inflammatory cytokines such as interleukin 1β (IL-1β) and are critical in COVID-19 pathology. Outspread variations of each of the six accessory proteins were observed across six continents of all complete SARS-CoV-2 proteomes based on the data reported before November 2020. A decreasing order of percentage of unique variations in the accessory proteins was determined as ORF3a > ORF8 > ORF7a > ORF6 > ORF10 > ORF7b across all continents. The highest and lowest unique variations of ORF3a were observed in South America and Oceania, respectively. These findings suggest that the wide variations in accessory proteins seem to affect the pathogenicity of SARS-CoV-2.
Collapse
Affiliation(s)
- Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, 721140, India.
| | - Pabitra Pal Choudhury
- Applied Statistics Unit, Indian Statistical Institute, Kolkata, 700108, West Bengal, India
| | - Guy W Dayhoff
- Department of Chemistry, College of Art and Sciences, University of South Florida, Tampa, FL, 33620, USA
| | - Alaa A A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University-Faculty of Pharmacy, Irbid, 566, Jordan
| | - Bruce D Uhal
- Department of Physiology, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Nima Rezaei
- Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran; Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Stockholm, Sweden
| | - Damiano Pizzol
- Italian Agency for Development Cooperation - Khartoum, Sudan Street 33, Al Amarat, Sudan
| | - Parise Adadi
- Department of Food Science, University of Otago, Dunedin, 9054, New Zealand
| | - Amos Lal
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA
| | - Antonio Soares
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA
| | - Tarek Mohamed Abd El-Aziz
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA; Zoology Department, Faculty of Science, Minia University, El-Minia, 61519, Egypt
| | - Adam M Brufsky
- University of Pittsburgh School of Medicine, Department of Medicine, Division of Hematology/Oncology, UPMC Hillman Cancer Center, Pittsburgh, PA, USA
| | | | - Samendra P Sherchan
- Department of Environmental Health Sciences, Tulane University, New Orleans, LA, 70112, USA
| | - Wagner Baetas-da-Cruz
- Translational Laboratory in Molecular Physiology, Centre for Experimental Surgery, College of Medicine, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | - Kazuo Takayama
- Center for iPS Cell Research and Application, Kyoto University, Japan
| | - Ãngel Serrano-Aroca
- Biomaterial and Bioengineering Lab, Translational Research Centre San Alberto Magno, Catholic University of Valencia San Vicente M'artir, c/Guillem de Castro 94, 46001, Valencia, Spain
| | - Gaurav Chauhan
- School of Engineering and Sciences, Tecnologico de Monterrey, Av. Eugenio Garza Sada 2501 Sur, 64849, Monterrey, Nuevo León, Mexico
| | - Giorgio Palu
- Department of Molecular Medicine, University of Padova, Via Gabelli 63, 35121, Padova, Italy
| | - Yogendra Kumar Mishra
- University of Southern Denmark, Mads Clausen Institute, NanoSYD, Alsion 2, 6400, Sønderborg, Denmark
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, WB, India; Departamento de Genética, Ecologia e Evolucao, Instituto de Cîencias Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Raner Jośe Santana Silva
- Departamento de Ciencias Biologicas (DCB), Programa de Pos-Graduacao em Genetica e Biologia Molecular (PPGGBM), Universidade Estadual de Santa Cruz (UESC), Rodovia Ilheus-Itabuna, km 16, 45662-900, Ilheus, BA, Brazil
| | - Bruno Silva Andrade
- Laboratório de Bioinformática e Química Computacional, Departamento de Ciências Biológicas, Universidade Estadual do Sudoeste da Bahia (UESB), Jequié, 45206-190, Brazil
| | - Vasco Azevedo
- Departamento de Genética, Ecologia e Evolucao, Instituto de Cîencias Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Aristóteles Góes-Neto
- Laboratório de Biologia Molecular e Computacional de Fungos, Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil
| | - Nicolas G Bazan
- Neuroscience Center of Excellence, School of Medicine, LSU Health New Orleans, New Orleans, LA, 70112, USA
| | - Elrashdy M Redwan
- King Abdulaz University, Faculty of Science, Department of Biological Science, Saudi Arabia
| | - Murtaza Tambuwala
- School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine, BT52 1SA, Northern Ireland, UK
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA; Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Moscow region, Russia.
| |
Collapse
|
39
|
Zhao B, Kurgan L. Deep learning in prediction of intrinsic disorder in proteins. Comput Struct Biotechnol J 2022; 20:1286-1294. [PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder prediction is an active area that has developed over 100 predictors. We identify and investigate a recent trend towards the development of deep neural network (DNN)-based methods. The first DNN-based method was released in 2013 and since 2019 deep learners account for majority of the new disorder predictors. We find that the 13 currently available DNN-based predictors are diverse in their topologies, sizes of their networks and the inputs that they utilize. We empirically show that the deep learners are statistically more accurate than other types of disorder predictors using the blind test dataset from the recent community assessment of intrinsic disorder predictions (CAID). We also identify several well-rounded DNN-based predictors that are accurate, fast and/or conveniently available. The popularity, favorable predictive performance and architectural flexibility suggest that deep networks are likely to fuel the development of future disordered predictors. Novel hybrid designs of deep networks could be used to adequately accommodate for diversity of types and flavors of intrinsic disorder. We also discuss scarcity of the DNN-based methods for the prediction of disordered binding regions and the need to develop more accurate methods for this prediction.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
40
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
41
|
Bondos SE, Dunker AK, Uversky VN. Intrinsically disordered proteins play diverse roles in cell signaling. Cell Commun Signal 2022; 20:20. [PMID: 35177069 PMCID: PMC8851865 DOI: 10.1186/s12964-022-00821-7] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/11/2021] [Indexed: 11/29/2022] Open
Abstract
Signaling pathways allow cells to detect and respond to a wide variety of chemical (e.g. Ca2+ or chemokine proteins) and physical stimuli (e.g., sheer stress, light). Together, these pathways form an extensive communication network that regulates basic cell activities and coordinates the function of multiple cells or tissues. The process of cell signaling imposes many demands on the proteins that comprise these pathways, including the abilities to form active and inactive states, and to engage in multiple protein interactions. Furthermore, successful signaling often requires amplifying the signal, regulating or tuning the response to the signal, combining information sourced from multiple pathways, all while ensuring fidelity of the process. This sensitivity, adaptability, and tunability are possible, in part, due to the inclusion of intrinsically disordered regions in many proteins involved in cell signaling. The goal of this collection is to highlight the many roles of intrinsic disorder in cell signaling. Following an overview of resources that can be used to study intrinsically disordered proteins, this review highlights the critical role of intrinsically disordered proteins for signaling in widely diverse organisms (animals, plants, bacteria, fungi), in every category of cell signaling pathway (autocrine, juxtacrine, intracrine, paracrine, and endocrine) and at each stage (ligand, receptor, transducer, effector, terminator) in the cell signaling process. Thus, a cell signaling pathway cannot be fully described without understanding how intrinsically disordered protein regions contribute to its function. The ubiquitous presence of intrinsic disorder in different stages of diverse cell signaling pathways suggest that more mechanisms by which disorder modulates intra- and inter-cell signals remain to be discovered.
Collapse
Affiliation(s)
- Sarah E. Bondos
- Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, College Station, TX 77843 USA
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612 USA
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Moscow Region, Russia 142290
| |
Collapse
|
42
|
Polanco C, Uversky VN, Dayhoff GW, Huberman A, Buhse T, Márquez MF, Vargas-Alarcón G, Castañón-González JA, Andrés L, Dı́az-González JL, González-Bañales K. Bioinformatics-Based Characterization of Proteins Related to SARS-CoV- 2 Using the Polarity Index Method® (PIM®) and Intrinsic Disorder Predisposition. CURR PROTEOMICS 2022. [DOI: 10.2174/1570164618666210106114606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The global outbreak of the 2019 novel Coronavirus Disease (COVID-19) caused by the infection with the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), which appeared in China at the end of
2019, signifies a major public health issue at the current time.
Objective:
The objective of the present study is to characterize the physicochemical properties of the SARS-CoV-2 proteins at a residues level, and to generate a “bioinformatics fingerprint” in the form of a “PIM® profile” created for each
sequence utilizing the Polarity Index Method® (PIM®), suitable for the identification of these proteins.
Methods:
Two different bioinformatics approaches were used to analyze sequence characteristics of these proteins at
the residues level, an in-house bioinformatics system PIM®, and a set of the commonly used algorithms for the predic-tion of protein intrinsic disorder predisposition, such as PONDR® VLXT, PONDR® VL3, PONDR® VSL2, PONDR®
FIT, IUPred_short and IUPred_long. The PIM® profile was generated for four SARS-CoV-2 structural proteins and
compared with the corresponding profiles of the SARS-CoV-2 non-structural proteins, SARS-CoV-2 putative proteins,
SARS-CoV proteins, MERS-CoV proteins, sets of bacterial, fungal, and viral proteins, cell-penetrating peptides, and a
set of intrinsically disordered proteins. We also searched for the UniProt proteins with PIM® profiles similar to those of
SARS-CoV-2 structural, non-structural, and putative proteins.
Results:
We show that SARS-CoV-2 structural, non-structural, and putative proteins are characterized by a unique
PIM® profile. A total of 1736 proteins were identified from the 562,253 “reviewed” proteins from the UniProt database,
whose PIM® profile was similar to that of the SARS-CoV-2 structural, non-structural, and putative proteins.
Conclusion:
The PIM® profile represents an important characteristic that might be useful for the identification of proteins similar to SARS-CoV-2 proteins.
Collapse
Affiliation(s)
- Carlos Polanco
- Department of Electromechanical Instrumentation, Instituto Nacional de Cardiología “Ignacio Chávez”, México City
14800, México
- Department of Mathematics, Faculty of Sciences, Universidad Nacional Autónoma de México, México
City 04510, México
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer\'s Research Institute, Morsani
College of Medicine, University of South Florida, Tampa, FL33647, USA
- Protein Research Group, Institute for
Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific Center
for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Moscow region, Russia
| | - Guy W. Dayhoff
- Department of Molecular Medicine and USF Health Byrd Alzheimer\'s Research Institute, Morsani
College of Medicine, University of South Florida, Tampa, FL33647, USA
| | - Alberto Huberman
- Department of Biochemistry, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, C.P. 14080 México City,
México
| | - Thomas Buhse
- Centro de Investigaciones Químicas, Universidad Autónoma del Estado de Morelos, Cuernavaca Morelos
62209, México
| | - Manlio F. Márquez
- Subdirección de Investigación Clínica, Instituto Nacional de Cardiología “Ignacio Chávez”, México
City 14800, México
| | - Gilberto Vargas-Alarcón
- Dirección de Investigación, Instituto Nacional de Cardiología “Ignacio Chávez”, México City
14800, México
| | | | - Leire Andrés
- Department
of Pathology, Hospital de Cruces, 48903, Barakaldo, Spain
| | - Juan Luciano Dı́az-González
- Department of Computer Sciences, Instituto de
Ciencias Nucleares, Universidad Nacional Autónoma de México, México City 04510, México
| | - Karina González-Bañales
- Department of Mathematics, Faculty of Sciences, Universidad Nacional Autónoma de México, México
City 04510, México
| |
Collapse
|
43
|
Polanco C, Uversky VN, Vargas-Alarcón G, Buhse T, Huberman A, Márquez MF, Andrés L. Characterization of Proteins from Putative Human DNA and RNA Viruses. CURR PROTEOMICS 2022. [DOI: 10.2174/1570164618666210212123850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
In the vast variety of viruses known, there is a particular interest in those transmitted to humans and whose ability to disseminate represents a significant public health issue.
Objective:
The present study’s objective is to bioinformatically characterize the proteins of the two main divisions of viruses, RNA-viruses and DNA-viruses.
Methods:
In this work, a set of in-house computational programs was used to calculate the polarity/charge profiles and intrinsic disorder predisposition profiles of the proteins of several groups of viruses representing both types extracted from UniProt database. The efficiency of these computational programs was statistically verified.
Results:
It was found that the polarity/charge profile of the proteins is, in most cases, an efficient discriminant that allows the re-creation of the taxonomy known for both viral groups. Additionally, the entire set of "reviewed" proteins in UniProt database was analyzed to find proteins with the polarity/charge profiles similar to those obtained for each viral group. This search revealed a substantial number of proteins with such polarity-charge profiles.
Conclusion:
Polarity/charge profile represents a physicochemical metric, which is easy to calculate, and which can be used to effectively identify viral groups from their protein sequences.
Collapse
Affiliation(s)
- Carlos Polanco
- Department of Electromechanical Instrumentation, Instituto Nacional de Cardiología “ Ignacio Chávez”, México
City 14800, México
- Department of Mathematics, Faculty of Sciences, Universidad Nacional Autónoma de México,
México City 04510, México
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer\'s Research Institute,
Morsani College of Medicine, University of South Florida, Tampa, FL33647, USA
- Protein Research Group, Institute
for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific
Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Moscow region, Russia
| | - Gilberto Vargas-Alarcón
- Research Center, Instituto Nacional de Cardiología “Ignacio Chávez”, México City 14800, México
| | - Thomas Buhse
- Chemical Research
Center, Universidad Autónoma del Estado de Morelos, Cuernavaca Morelos 62209, México
| | - Alberto Huberman
- Department of
Biochemistry, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, C.P. 14080 México City, México
| | - Manlio F. Márquez
- Clinical Research Center, Instituto Nacional de Cardiología “Ignacio Chávez”, México City 14800, México
| | - Leire Andrés
- Department
of Pathology, Hospital de Cruces, 48903, Barakaldo, Spain
| |
Collapse
|
44
|
Tamburrini KC, Pesce G, Nilsson J, Gondelaud F, Kajava AV, Berrin JG, Longhi S. Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol 2022; 2449:95-147. [PMID: 35507260 DOI: 10.1007/978-1-0716-2095-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the last two decades it has become increasingly evident that a large number of proteins adopt either a fully or a partially disordered conformation. Intrinsically disordered proteins are ubiquitous proteins that fulfill essential biological functions while lacking a stable 3D structure. Their conformational heterogeneity is encoded by the amino acid sequence, thereby allowing intrinsically disordered proteins or regions to be recognized based on their sequence properties. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting protein disorder and identifying intrinsically disordered binding sites.
Collapse
Affiliation(s)
- Ketty C Tamburrini
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Giulia Pesce
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Juliet Nilsson
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Frank Gondelaud
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Université Montpellier, Montpellier, France
| | - Jean-Guy Berrin
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Sonia Longhi
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France.
| |
Collapse
|
45
|
Hassan SS, Lundstrom K, Serrano-Aroca Á, Adadi P, Aljabali AAA, Redwan EM, Lal A, Kandimalla R, El-Aziz TMA, Pal Choudhury P, Azad GK, Sherchan SP, Chauhan G, Tambuwala M, Takayama K, Barh D, Palu G, Basu P, Uversky VN. Emergence of unique SARS-CoV-2 ORF10 variants and their impact on protein structure and function. Int J Biol Macromol 2022; 194:128-143. [PMID: 34863825 PMCID: PMC8635690 DOI: 10.1016/j.ijbiomac.2021.11.151] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 11/18/2021] [Accepted: 11/22/2021] [Indexed: 02/07/2023]
Abstract
The devastating impact of the ongoing coronavirus disease 2019 (COVID-19) on public health, caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has made targeting the COVID-19 pandemic a top priority in medical research and pharmaceutical development. Surveillance of SARS-CoV-2 mutations is essential for the comprehension of SARS-CoV-2 variant diversity and their impact on virulence and pathogenicity. The SARS-CoV-2 open reading frame 10 (ORF10) protein interacts with multiple human proteins CUL2, ELOB, ELOC, MAP7D1, PPT1, RBX1, THTPA, TIMM8B, and ZYG11B expressed in lung tissue. Mutations and co-occurring mutations in the emerging SARS-CoV-2 ORF10 variants are expected to impact the severity of the virus and its associated consequences. In this article, we highlight 128 single mutations and 35 co-occurring mutations in the unique SARS-CoV-2 ORF10 variants. The possible predicted effects of these mutations and co-occurring mutations on the secondary structure of ORF10 variants and host protein interactomes are presented. The findings highlight the possible effects of mutations and co-occurring mutations on the emerging 140 ORF10 unique variants from secondary structure and intrinsic protein disorder perspectives.
Collapse
Affiliation(s)
- Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur 721140, West Bengal, India.
| | | | - Ángel Serrano-Aroca
- Biomaterials and Bioengineering Lab, Centro de Investigacion Traslacional San Alberto Magno, Universidad Catolica de Valencia San Vicente Martir, c/Guillem de Castro, 94, 46001 Valencia, Valencia, Spain.
| | - Parise Adadi
- Department of Food Science, University of Otago, Dunedin 9054, New Zealand
| | - Alaa A A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University, Faculty of Pharmacy, Irbid 566, Jordan.
| | - Elrashdy M Redwan
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia; Therapeutic and Protective Proteins Laboratory, Protein Research Department, Genetic Engineering and Biotechnology Research Institute, City of Scientific Research and Technological Applications, New Borg EL-Arab 21934, Alexandria, Egypt.
| | - Amos Lal
- Department of Medicine, Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA
| | - Ramesh Kandimalla
- Applied Biology, CSIR-Indian Institute of Chemical Technology, Uppal Road, Tarnaka, Hyderabad 500007, Telangana, India; Department of Biocemistry, Kakatiya Medical College, Warangal, Telangana, India
| | - Tarek Mohamed Abd El-Aziz
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX 78229-3900, USA; Zoology Department, Faculty of Science, Minia University, El-Minia 61519, Egypt.
| | - Pabitra Pal Choudhury
- Indian Statistical Institute, Applied Statistics Unit, 203 B T Road, Kolkata 700108, India.
| | | | - Samendra P Sherchan
- Department of Environmental Health Sciences, Tulane University, New Orleans, LA, 70112, USA.
| | - Gaurav Chauhan
- School of Engineering and Sciences, Tecnologico de Monterrey, 64849 Monterrey, Nuevo Leon, Mexico.
| | - Murtaza Tambuwala
- School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine BT52 1SA, Northern Ireland, UK.
| | - Kazuo Takayama
- Center for iPS Cell Research and Application, Kyoto University, Kyoto 6068507, Japan.
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur 721172, West Bengal, India; Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil.
| | - Giorgio Palu
- Department of Molecular Medicine, University of Padova, Via Gabelli 63, 35121 Padova, Italy.
| | - Pallab Basu
- School of Physics, University of the Witwatersrand, Johannesburg, Braamfontein 2000, 721140, South Africa.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| |
Collapse
|
46
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
47
|
Katuwawala A, Zhao B, Kurgan L. DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning. Bioinformatics 2021; 38:115-124. [PMID: 34487138 DOI: 10.1093/bioinformatics/btab640] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/05/2021] [Accepted: 09/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs). RESULTS DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred's predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods. AVAILABILITY AND IMPLEMENTATION DisoLipPred's webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
48
|
Zhang F, Zhao B, Shi W, Li M, Kurgan L. DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform 2021; 23:6461158. [PMID: 34905768 DOI: 10.1093/bib/bbab521] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 10/30/2021] [Accepted: 11/14/2021] [Indexed: 12/14/2022] Open
Abstract
Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Wenbo Shi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
49
|
Paiz EA, Allen JH, Correia JJ, Fitzkee NC, Hough LE, Whitten ST. Beta turn propensity and a model polymer scaling exponent identify intrinsically disordered phase-separating proteins. J Biol Chem 2021; 297:101343. [PMID: 34710373 PMCID: PMC8592878 DOI: 10.1016/j.jbc.2021.101343] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 10/19/2021] [Accepted: 10/20/2021] [Indexed: 12/14/2022] Open
Abstract
The complex cellular milieu can spontaneously demix, or phase separate, in a process controlled in part by intrinsically disordered (ID) proteins. A protein's propensity to phase separate is thought to be driven by a preference for protein-protein over protein-solvent interactions. The hydrodynamic size of monomeric proteins, as quantified by the polymer scaling exponent (v), is driven by a similar balance. We hypothesized that mean v, as predicted by protein sequence, would be smaller for proteins with a strong propensity to phase separate. To test this hypothesis, we analyzed protein databases containing subsets of proteins that are folded, disordered, or disordered and known to spontaneously phase separate. We find that the phase-separating disordered proteins, on average, had lower calculated values of v compared with their non-phase-separating counterparts. Moreover, these proteins had a higher sequence-predicted propensity for β-turns. Using a simple, surface area-based model, we propose a physical mechanism for this difference: transient β-turn structures reduce the desolvation penalty of forming a protein-rich phase and increase exposure of atoms involved in π/sp2 valence electron interactions. By this mechanism, β-turns could act as energetically favored nucleation points, which may explain the increased propensity for turns in ID regions (IDRs) utilized biologically for phase separation. Phase-separating IDRs, non-phase-separating IDRs, and folded regions could be distinguished by combining v and β-turn propensity. Finally, we propose a new algorithm, ParSe (partition sequence), for predicting phase-separating protein regions, and which is able to accurately identify folded, disordered, and phase-separating protein regions based on the primary sequence.
Collapse
Affiliation(s)
- Elisia A Paiz
- Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, USA
| | - Jeffre H Allen
- Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado, USA
| | - John J Correia
- Department of Cell and Molecular Biology, University of Mississippi Medical Center, Jackson, Mississippi, USA
| | - Nicholas C Fitzkee
- Department of Chemistry, Mississippi State University, Mississippi State, Mississippi, USA
| | - Loren E Hough
- Department of Physics, University of Colorado Boulder, Boulder, Colorado, USA; BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado, USA.
| | - Steven T Whitten
- Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, USA.
| |
Collapse
|
50
|
Boyko KV, Rosenkranz EA, Smith DM, Miears HL, Oueld es cheikh M, Lund MZ, Young JC, Reardon PN, Okon M, Smirnov SL, Antos JM. Sortase-mediated segmental labeling: A method for segmental assignment of intrinsically disordered regions in proteins. PLoS One 2021; 16:e0258531. [PMID: 34710113 PMCID: PMC8553144 DOI: 10.1371/journal.pone.0258531] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Accepted: 09/29/2021] [Indexed: 11/18/2022] Open
Abstract
A significant number of proteins possess sizable intrinsically disordered regions (IDRs). Due to the dynamic nature of IDRs, NMR spectroscopy is often the tool of choice for characterizing these segments. However, the application of NMR to IDRs is often hindered by their instability, spectral overlap and resonance assignment difficulties. Notably, these challenges increase considerably with the size of the IDR. In response to these issues, here we report the use of sortase-mediated ligation (SML) for segmental isotopic labeling of IDR-containing samples. Specifically, we have developed a ligation strategy involving a key segment of the large IDR and adjacent folded headpiece domain comprising the C-terminus of A. thaliana villin 4 (AtVLN4). This procedure significantly reduces the complexity of NMR spectra and enables group identification of signals arising from the labeled IDR fragment, a process we refer to as segmental assignment. The validity of our segmental assignment approach is corroborated by backbone residue-specific assignment of the IDR using a minimal set of standard heteronuclear NMR methods. Using segmental assignment, we further demonstrate that the IDR region adjacent to the headpiece exhibits nonuniform spectral alterations in response to temperature. Subsequent residue-specific characterization revealed two segments within the IDR that responded to temperature in markedly different ways. Overall, this study represents an important step toward the selective labeling and probing of target segments within much larger IDR contexts. Additionally, the approach described offers significant savings in NMR recording time, a valuable advantage for the study of unstable IDRs, their binding interfaces, and functional mechanisms.
Collapse
Affiliation(s)
- Kristina V. Boyko
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - Erin A. Rosenkranz
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - Derrick M. Smith
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - Heather L. Miears
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - Melissa Oueld es cheikh
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - Micah Z. Lund
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - Jeffery C. Young
- Department of Biology, Western Washington University, Bellingham, Washington, United States of America
| | - Patrick N. Reardon
- Oregon State University NMR Facility, Oregon State University, Corvallis, Oregon, United States of America
| | - Mark Okon
- Department of Biochemistry and Molecular Biology, Department of Chemistry, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Serge L. Smirnov
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| | - John M. Antos
- Department of Chemistry, Western Washington University, Bellingham, Washington, United States of America
| |
Collapse
|