1
|
Berkeley R, Plonski AP, Phan TM, Grohe K, Becker L, Wegner S, Herzik MA, Mittal J, Debelouchina GT. Capturing the Conformational Heterogeneity of HSPB1 Chaperone Oligomers at Atomic Resolution. J Am Chem Soc 2025; 147:15181-15194. [PMID: 40146081 PMCID: PMC12063158 DOI: 10.1021/jacs.4c18668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 03/16/2025] [Accepted: 03/18/2025] [Indexed: 03/28/2025]
Abstract
Small heat shock proteins (sHSPs), including HSPB1, are essential regulators of cellular proteostasis that interact with unfolded and partially folded proteins to prevent aberrant misfolding and aggregation. These proteins fulfill a similar role in biological condensates, where they interact with intrinsically disordered proteins to modulate their liquid-liquid and liquid-to-solid phase transitions. Characterizing the sHSP structure, dynamics, and client interactions is challenging due to their partially disordered nature, their tendency to form polydisperse oligomers, and their diverse range of clients. In this work, we leverage various biophysical methods, including fast 1H-based magic angle spinning (MAS) NMR spectroscopy, molecular dynamics (MD) simulations, and modeling, to shed new light on the structure and dynamics of HSPB1 oligomers. Using split-intein-mediated segmental labeling, we provide unambiguous evidence that in the oligomer context, the N-terminal domain (NTD) of HSPB1 is rigid and adopts an ensemble of heterogeneous conformations, the α-Crystallin domain (ACD) forms dimers and experiences multiple distinct local environments, while the C-terminal domain (CTD) remains highly dynamic. Our computational models suggest that the NTDs participate in extensive NTD-NTD and NTD-ACD interactions and are sequestered within the oligomer interior. We further demonstrate that HSPB1 higher order oligomers disassemble into smaller oligomeric species in the presence of a client protein and that an accessible NTD is essential for HSPB1 partitioning into condensates and interactions with client proteins. Our integrated approach provides a high-resolution view of the complex oligomeric landscape of HSPB1 and sheds light on the elusive network of interactions that underlies the function of HSPB1 in biological condensates.
Collapse
Affiliation(s)
- Raymond
F. Berkeley
- Department
of Chemistry and Biochemistry, University
of California San Diego, La Jolla, California 92093, United States
| | - Alexander P. Plonski
- Department
of Chemistry and Biochemistry, University
of California San Diego, La Jolla, California 92093, United States
| | - Tien M. Phan
- Artie
McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Kristof Grohe
- Bruker
BioSpin GmbH & Co. KG, Ettlingen 76275, Germany
| | - Lukas Becker
- Bruker
BioSpin GmbH & Co. KG, Ettlingen 76275, Germany
| | | | - Mark A. Herzik
- Department
of Chemistry and Biochemistry, University
of California San Diego, La Jolla, California 92093, United States
| | - Jeetain Mittal
- Artie
McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843, United States
- Department
of Chemistry, Texas A&M University, College Station, Texas 77843, United States
- Interdisciplinary
Graduate Program in Genetics and Genomics, Texas A&M University, College
Station, Texas 77843, United States
| | - Galia T. Debelouchina
- Department
of Chemistry and Biochemistry, University
of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
2
|
Han KS, Kim HK, Kim MH, Pak MH, Pak SJ, Choe MM, Kim CS. PredIDR2: Improving accuracy of protein intrinsic disorder prediction by updating deep convolutional neural network and supplementing DisProt data. Int J Biol Macromol 2025; 306:141801. [PMID: 40054813 DOI: 10.1016/j.ijbiomac.2025.141801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 03/03/2025] [Accepted: 03/04/2025] [Indexed: 05/11/2025]
Abstract
Intrinsically disordered proteins (IDPs) or regions (IDRs) are widespread in proteomes, and involved in several important biological processes and implicated in many diseases. Many computational methods for IDR prediction are being developed to decrease the gap between the low speed of experimental determination of annotated proteins and the rapid increase of non-annotated proteins, and their performances are blindly tested by the community-driven experiment, the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we developed PredIDR2 series, an updated version of PredIDR tested in CAID2 in order to accurately predict intrinsically disordered regions from protein sequence. It includes four methods depending on the input features and the producing mode of the negative samples of the training set. PredIDR2 series (AUC_ROC = 0.952) perform remarkably better than our previous PredIDR (AUC_ROC = 0.933) for Disorder-PDB dataset of CAID2, which seems to be mainly attributed to the introduction of a new deep convolutional neural network and the augmentation of the training data, especially from DisProt database. PredIDR2 series outperform the state-of-the-art IDR prediction methods participated in CAID2 in terms of AUC_ROC, AUC_PR and DC_mae and belong to the seven top-performing methods in terms of MCC. PredIDR2 series can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
Collapse
Affiliation(s)
- Kun-Sop Han
- University of Sciences, Pyongyang, Democratic People's Republic of Korea.
| | - Ha-Kyong Kim
- Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyok Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyon Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Song-Jin Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Mun-Myong Choe
- University of Science and Technology, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Song Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| |
Collapse
|
3
|
Sharma KK, Raghuvamsi PV, Aik DYK, Marzinek JK, Bond PJ, Wohland T. Structural flexibility in the ordered domain of the dengue virus strain 2 capsid protein is critical for chaperoning viral RNA replication. Cell Mol Life Sci 2025; 82:184. [PMID: 40293525 PMCID: PMC12037954 DOI: 10.1007/s00018-025-05712-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 04/06/2025] [Accepted: 04/11/2025] [Indexed: 04/30/2025]
Abstract
Viral replication necessitates intricate nucleic acid rearrangements, including annealing and strand displacement to achieve the viral RNA functional structure. Often a single RNA chaperone performs these seemingly incompatible functions. This raises the question of what structural and dynamic features of such chaperones govern distinct RNA rearrangements. While cationic intrinsically disordered regions promote annealing by playing a charge-screening role, how the same chaperone mediates strand displacement remains elusive. Here, we investigate the annealing and strand displacement of the 5' upstream AUG region (5UAR) as chaperoned by the Dengue virus strain 2 capsid protein (Denv2C) as a model RNA chaperone. Through single molecule analysis and molecular simulations, we demonstrate that Denv2C regulates nucleic acid melting, folding, annealing, and strand displacement via flexibility in its ordered region. A mutation that renders the Denv2C ordered region rigid, converts Denv2C into a mere annealer. Our findings underscore the role of Denv2C's disordered region as a "macromolecular counterion" during RNA annealing, while a flexible ordered region is crucial for effective strand displacement.
Collapse
Affiliation(s)
- Kamal K Sharma
- Centre for Bioimaging Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117557, Singapore.
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117543, Singapore.
| | - Palur Venkata Raghuvamsi
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Daniel Y K Aik
- Centre for Bioimaging Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117557, Singapore
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore, 117543, Singapore
| | - Jan K Marzinek
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Peter J Bond
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117543, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Thorsten Wohland
- Centre for Bioimaging Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117557, Singapore.
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117543, Singapore.
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore, 117543, Singapore.
| |
Collapse
|
4
|
Alanazi W, Meng D, Pollastri G. Advancements in one-dimensional protein structure prediction using machine learning and deep learning. Comput Struct Biotechnol J 2025; 27:1416-1430. [PMID: 40242292 PMCID: PMC12002955 DOI: 10.1016/j.csbj.2025.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 04/01/2025] [Accepted: 04/02/2025] [Indexed: 04/18/2025] Open
Abstract
The accurate prediction of protein structures remains a cornerstone challenge in structural bioinformatics, essential for understanding the intricate relationship between protein sequence, structure, and function. Recent advancements in Machine Learning (ML) and Deep Learning (DL) have revolutionized this field, offering innovative approaches to tackle one- dimensional (1D) protein structure annotations, including secondary structure, solvent accessibility, and intrinsic disorder. This review highlights the evolution of predictive methodologies, from early machine learning models to sophisticated deep learning frameworks that integrate sequence embeddings and pretrained language models. Key advancements, such as AlphaFold's transformative impact on structure prediction and the rise of protein language models (PLMs), have enabled unprecedented accuracy in capturing sequence-structure relationships. Furthermore, we explore the role of specialized datasets, benchmarking competitions, and multimodal integration in shaping state-of-the-art prediction models. By addressing challenges in data quality, scalability, interpretability, and task-specific optimization, this review underscores the transformative impact of ML, DL, and PLMs on 1D protein prediction while providing insights into emerging trends and future directions in this rapidly evolving field.
Collapse
Affiliation(s)
- Wafa Alanazi
- School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland
- Department of Computer Science, College of Science, Northern Border University, Arar, Saudi Arabia
| | - Di Meng
- School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland
| |
Collapse
|
5
|
Antonietti M, Kim CK, Granack S, Hadzijahic N, Taylor Gonzalez DJ, Herskowitz WR, Uversky VN, Djulbegovic MB. An Analysis of Intrinsic Protein Disorder in Antimicrobial Peptides. Protein J 2025; 44:175-191. [PMID: 39979561 PMCID: PMC11937183 DOI: 10.1007/s10930-025-10253-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2025] [Indexed: 02/22/2025]
Abstract
Antibiotic resistance, driven by the rise of pathogens like VRE and MRSA, poses a global health threat, prompting the exploration of antimicrobial peptides (AMPs) as alternatives to traditional antibiotics. AMPs, known for their broad-spectrum activity and structural flexibility, share characteristics with intrinsically disordered proteins, which lack a rigid structure and play diverse roles in cellular processes. This study aims to quantify the intrinsic disorder and liquid-liquid phase separation (LLPS) propensity in AMPs, advancing our understanding of their antimicrobial mechanisms and potential therapeutic applications. To investigate the propensity for intrinsic disorder and LLPS in AMPs, we compared the AMPs to the human proteome. The AMP sequences were retrieved from the AMP database (APD3), while the human proteome was obtained from the UniProt database. We analyzed amino acid composition using the Composition Profiler tool and assessed intrinsic disorder using various predictors, including PONDR® and IUPred, through the Rapid Intrinsic Disorder Analysis Online (RIDAO) platform. For LLPS propensity, we employed FuzDrop, and FuzPred was used to predict context-dependent binding behaviors. Statistical analyses, such as ANOVA and χ2 tests, were performed to determine the significance of observed differences between the two groups. We analyzed over 3000 AMPs and 20,000 human proteins to investigate differences in amino acid composition, intrinsic disorder, and LLPS potential. Composition analysis revealed distinct differences in amino acid abundance, with AMPs showing an enrichment in both order-promoting and disorder-promoting amino acids compared to the human proteome. Intrinsic disorder analysis, performed using a range of predictors, consistently demonstrated that AMPs exhibit higher levels of predicted disorder than human proteins, with significant differences confirmed by statistical tests. LLPS analysis, conducted using FuzDrop, showed that AMPs had a lower overall propensity for LLPS compared to human proteins, although specific subsets of AMPs exhibited high LLPS potential. Additionally, redox-dependent disorder predictions highlighted significant differences in how AMP and human proteins respond to oxidative conditions, further suggesting functional divergences between the two proteomes. CH-CDF plot analysis revealed that AMPs and human proteins occupy distinct structural categories, with AMPs showing a greater proportion of highly disordered proteins compared to the human proteome. These findings underscore key molecular differences between AMPs and human proteins, with implications for their antimicrobial activity and potential therapeutic applications. Our study reveals that AMPs possess a significantly higher degree of intrinsic disorder and specific subsets exhibit LLPS potential, distinguishing them from the human proteome. These molecular characteristics likely contribute to their antimicrobial function and adaptability, offering valuable insights for developing novel therapeutic strategies to combat antibiotic resistance.
Collapse
Affiliation(s)
| | - Colin K Kim
- Bascom Palmer Eye Institute, University of Miami, Miami, FL, USA
| | - Sydney Granack
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | | | - David J Taylor Gonzalez
- Hamilton Eye Institute, University of Tennessee Health and Science Center, Memphis, United States
| | | | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Mak B Djulbegovic
- Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Malhotra Y, John J, Yadav D, Sharma D, Vanshika, Rawal K, Mishra V, Chaturvedi N. Advancements in protein structure prediction: A comparative overview of AlphaFold and its derivatives. Comput Biol Med 2025; 188:109842. [PMID: 39970826 DOI: 10.1016/j.compbiomed.2025.109842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 02/21/2025]
Abstract
This review provides a comprehensive analysis of AlphaFold (AF) and its derivatives (AF2 and AF3) in protein structure prediction. These tools have revolutionized structural biology with their highly accurate predictions, driving progress in protein modeling, drug discovery, and the study of protein dynamics. Its exceptional accuracy has redefined our understanding of protein folding, which enables groundbreaking advancements in protein design, disease research and discusses future integration with experimental techniques. In addition, their achievement features, architectures, important case studies, and noteworthy effects in the field of biology and medicine were evaluated. In consideration of the fact that AF2 is a relatively recent innovation, it has already been taken into account in many studies that highlight its applications in many ways. Moreover, the limitations of AF2 that directed to the introduction of AF3 are also reported, which is a great improvement as it provides precise predictions of the structures and interactions of proteins, DNA, RNA, and ligands, thereby aiding in the understanding of the molecular level. Addressing current challenges and forecasting future developments, this work underscores the lasting significance of AF in reshaping the scientific landscape of protein research.
Collapse
Affiliation(s)
- Yuktika Malhotra
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Jerry John
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepika Yadav
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepshikha Sharma
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vanshika
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Kamal Rawal
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vaibhav Mishra
- Amity Institute of Microbial Technology, Amity University, Uttar Pradesh, 201303, India
| | - Navaneet Chaturvedi
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India.
| |
Collapse
|
7
|
Zheng S. Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins. PLoS One 2025; 20:e0313812. [PMID: 40131945 PMCID: PMC11936262 DOI: 10.1371/journal.pone.0313812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 02/18/2025] [Indexed: 03/27/2025] Open
Abstract
The study investigated regions with undefined structures, known as "missing" segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence and disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (PDB), residues were categorized as "modeled", "hard missing" and "soft missing" based on their visibility in structural datasets. Key features were determined, including a confidence score predicted local distance difference test (pLDDT) from AlphaFold2, an advanced structural prediction tool, and a disorder score from IUPred, a traditional disorder prediction method. To enhance prediction performance for unstructured residues, we employed a Long Short-Term Memory (LSTM) model, integrating both scores with amino acid sequences. Notable patterns such as composition, region lengths and prediction scores were observed in unstructured residues and regions identified through structural experiments over our studied period. Our findings also indicate that "hard missing" residues often align with low confidence scores, whereas "soft missing" residues exhibit dynamic behavior that can complicate predictions. The incorporation of pLDDT, IUPred scores, and sequence data into the LSTM model has improved the differentiation between structured and unstructured residues, particularly for shorter unstructured regions. This research elucidates the relationship between established computational predictions and experimental structural data, enhancing our ability to target structurally significant areas for research and guiding experimental designs toward functionally relevant regions.
Collapse
Affiliation(s)
- Sen Zheng
- Bio-Electron Microscopy Facility, iHuman Institution, ShanghaiTech University, Shanghai, China
| |
Collapse
|
8
|
Erdős G, Deutsch N, Dosztányi Z. AIUPred - Binding: Energy Embedding to Identify Disordered Binding Regions. J Mol Biol 2025:169071. [PMID: 40133781 DOI: 10.1016/j.jmb.2025.169071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/25/2025] [Accepted: 03/03/2025] [Indexed: 03/27/2025]
Abstract
Intrinsically disordered regions (IDRs) play critical roles in various cellular processes, often mediating interactions through disordered binding regions that transition to ordered states. Experimental characterization of these functional regions is highly challenging, underscoring the need for fast and accurate computational tools. Despite their importance, predicting disordered binding regions remains a significant challenge due to limitations in existing datasets and methodologies. In this study, we introduce AIUPred-binding, a novel prediction tool leveraging a high dimensional mathematical representation of structural energies - we call energy embedding - and pathogenicity scores from AlphaMissense. By employing a transfer learning approach, AIUPred-binding demonstrates improved accuracy in identifying functional sites within IDRs. Our results highlight the tool's ability to discern subtle features within disordered regions, addressing biases and other challenges associated with manually curated datasets. We present AIUPred-binding integrated into the AIUPred web framework as a versatile and efficient resource for understanding the functional roles of IDRs. AIUPred-binding is freely accessible at https://aiupred.elte.hu.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| | - Norbert Deutsch
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| |
Collapse
|
9
|
Chakraborty A, Hussain A, Sabnam N. Uncovering the structural stability of Magnaporthe oryzae effectors: a secretome-wide in silico analysis. J Biomol Struct Dyn 2025; 43:1701-1722. [PMID: 38109060 DOI: 10.1080/07391102.2023.2292795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023]
Abstract
Rice blast, caused by the ascomycete fungus Magnaporthe oryzae, is a deadly disease and a major threat to global food security. The pathogen secretes small proteinaceous effectors, virulence factors, inside the host to manipulate and perturb the host immune system, allowing the pathogen to colonize and establish a successful infection. While the molecular functions of several effectors are characterized, very little is known about the structural stability of these effectors. We analyzed a total of 554 small secretory proteins (SSPs) from the M. oryzae secretome to decipher key features of intrinsic disorder (ID) and the structural dynamics of the selected putative effectors through thorough and systematic in silico studies. Our results suggest that out of the total SSPs, 66% were predicted as effector proteins, released either into the apoplast or cytoplasm of the host cell. Of these, 68% were found to be intrinsically disordered effector proteins (IDEPs). Among the six distinct classes of disordered effectors, we observed peculiar relationships between the localization of several effectors in the apoplast or cytoplasm and the degree of disorder. We determined the degree of structural disorder and its impact on protein foldability across all the putative small secretory effector proteins from the blast pathogen, further validated by molecular dynamics simulation studies. This study provides definite clues toward unraveling the mystery behind the importance of structural distortions in effectors and their impact on plant-pathogen interactions. The study of these dynamical segments may help identify new effectors as well.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Afzal Hussain
- Department of Bioinformatics, Maulana Azad National Institute of Technology, Bhopal, India
| | - Nazmiara Sabnam
- Department of Life Sciences, Presidency University, Kolkata, India
| |
Collapse
|
10
|
Meng D, Pollastri G. PUNCH: An Interactive Web Server for Predicting Intrinsically Disordered Regions in Protein Sequences. J Mol Biol 2025:169018. [PMID: 40133791 DOI: 10.1016/j.jmb.2025.169018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Revised: 01/27/2025] [Accepted: 02/16/2025] [Indexed: 03/27/2025]
Abstract
PUNCH is a freely accessible web server designed for the rapid and accurate prediction of intrinsically disordered regions (IDRs) in protein sequences. Built on a high-performance computational framework, PUNCH web server which built on PUNCH2-Light predictor, combines speed with predictive accuracy, offering users a streamlined interface for generating predictions from sequence input. Validated against the CAID2 benchmarking datasets, PUNCH web server demonstrates competitive performance in detecting IDRs across diverse protein sequences. Notably, it excels in the Disorder_PDB dataset and provides reliable results for the Disorder_NOX dataset, addressing the challenges of predicting disordered regions with low sequence similarity. The server is available at https://alienlabs.ucd.ie/punch2/, with extensive documentation and downloadable example datasets to support researchers in structural biology and bioinformatics.
Collapse
Affiliation(s)
- Di Meng
- School of Computer Science, University College Dublin, Ireland.
| | | |
Collapse
|
11
|
Vincoff S, Goel S, Kholina K, Pulugurta R, Vure P, Chatterjee P. FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking. Nat Commun 2025; 16:1436. [PMID: 39920196 PMCID: PMC11806025 DOI: 10.1038/s41467-025-56745-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 01/24/2025] [Indexed: 02/09/2025] Open
Abstract
Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%-40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.
Collapse
Affiliation(s)
- Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Shrey Goel
- Department of Computer Science, Duke University, Durham, NC, USA
| | - Kseniia Kholina
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Rishab Pulugurta
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranay Vure
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Computer Science, Duke University, Durham, NC, USA.
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
| |
Collapse
|
12
|
Chaurasiya D, Mondal R, Lahiri T, Tripathi A, Ghinmine T. IDPpred: a new sequence-based predictor for identification of intrinsically disordered protein with enhanced accuracy. J Biomol Struct Dyn 2025; 43:957-965. [PMID: 38079339 DOI: 10.1080/07391102.2023.2290615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/15/2023] [Indexed: 01/01/2025]
Abstract
Discovery of intrinsically disordered proteins (IDPs) and protein hybrids that contain both intrinsically disordered protein regions (IDPRs) along with ordered regions has changed the sequence-structure-function paradigm of protein. These proteins with lack of persistently fixed structure are often found in all organisms and play vital roles in various biological processes. Some of them are considered as potential drug targets due to their overrepresentation in pathophysiological processes. The major bottlenecks for characterizing such proteins are their occasional overexpression, difficulty in getting purified homogeneous form and the challenge of investigating them experimentally. Sequence-based prediction of intrinsic disorder remains a useful strategy especially for many large-scale proteomic investigations. However, worst accuracy still occurs for short disordered regions with less than ten residues, for the residues close to order-disorder boundaries, for regions that undergo coupled folding and binding in presence of partner, and for prediction of fully disordered proteins. Annotation of fully disordered proteins mostly relies on the far-UV circular dichroism experiment which gives overall secondary structure composition without residue-level resolution. Current methods including that using secondary structure information failed to predict half of target IDPs correctly in the recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment. This study utilized profiles of random sequential appearance of physicochemical properties of amino acids and random sequential appearance of order and disorder promoting amino acids in protein together with the existing CIDER feature for the prediction of IDP from sequence input. Our method was found to significantly outperform the existing predictors across different datasets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Deepak Chaurasiya
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| | - Rajkrishna Mondal
- Department of Biotechnology, Nagaland University, Dimapur, Nagaland, India
| | - Tapobrata Lahiri
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| | - Asmita Tripathi
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| | - Tejas Ghinmine
- Department of Applied Sciences, Indian Institute of Information Technology, Prayagraj, UP, India
| |
Collapse
|
13
|
Djulbegovic MB, Antonietti M, Taylor Gonzalez DJ, Mattes R, Kim C, Uversky VN, Martinez JD, Karp CL. Comparative Analysis of the Intrinsic Disorder Within the Layers of the Human Cornea. Cornea 2025; 44:234-249. [PMID: 39383473 DOI: 10.1097/ico.0000000000003706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Accepted: 08/22/2024] [Indexed: 10/11/2024]
Abstract
PURPOSE The human cornea is essential for vision, providing structural integrity and refractive power to the eye. Recent advancements have deepened our understanding of the corneal molecular composition, yet the role of intrinsically disordered proteins within the cornea is unexplored. METHODS We analyzed 3,250 corneal proteins identified by Dyrlund et al, focusing on the epithelium, stroma, and endothelium layers. We performed a bioinformatics analysis to characterize the amino acid composition, the propensity for intrinsic protein disorder, and the distribution of protein types in 3 corneal layer proteome. RESULTS Our study demonstrates that each corneal layer exhibited unique patterns in amino acid composition related to protein disorder. Order-promoting amino acids were generally depleted except for leucine, whereas disorder-promoting amino acids like arginine and glutamic acid were enriched across all layers. Significant variations were observed in the levels of intrinsic disorder among the different corneal layers, with substantial proportions of highly disordered proteins present in each. Analysis of protein class type in each layers revealed that no significant differences were detected in the distribution of protein classifications across the layers, suggesting a consistent population of the protein types across all corneal layers. CONCLUSIONS Our findings reveal a sophisticated landscape of protein structures where intrinsic disorder varies across layers, suggesting an adaptation of the corneal proteome to the unique physiological demands of each layer. These structural variations may reflect the intricate requirements for corneal transparency, biomechanical stability, and environmental responsiveness.
Collapse
Affiliation(s)
| | | | | | - Robby Mattes
- Bascom Palmer Eye Institute, University of Miami, Miami, FL
| | - Colin Kim
- Bascom Palmer Eye Institute, University of Miami, Miami, FL
| | - Vladimir N Uversky
- Department of Chemistry, University of South Florida, Tampa, FL; and
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL
| | | | - Carol L Karp
- Bascom Palmer Eye Institute, University of Miami, Miami, FL
| |
Collapse
|
14
|
Szczepski K, Jaremko Ł. AlphaFold and what is next: bridging functional, systems and structural biology. Expert Rev Proteomics 2025; 22:45-58. [PMID: 39824781 DOI: 10.1080/14789450.2025.2456046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Revised: 01/13/2025] [Accepted: 01/16/2025] [Indexed: 01/20/2025]
Abstract
INTRODUCTION The DeepMind's AlphaFold (AF) has revolutionized biomedical and biocience research by providing both experts and non-experts with an invaluable tool for predicting protein structures. However, while AF is highly effective for predicting structures of rigid and globular proteins, it is not able to fully capture the dynamics, conformational variability, and interactions of proteins with ligands and other biomacromolecules. AREAS COVERED In this review, we present a comprehensive overview of the latest advancements in 3D model predictions for biomacromolecules using AF. We also provide a detailed analysis its of strengths and limitations, and explore more recent iterations, modifications, and practical applications of this strategy. Moreover, we map the path forward for expanding the landscape of AF toward predicting structures of every protein and peptide, and their interactions in the proteome in the most physiologically relevant form. This discussion is based on an extensive literature search performed using PubMed and Google Scholar. EXPERT OPINION While significant progress has been made to enhance AF's modeling capabilities, we argue that a combined approach integrating both various in silico and in vitro methods will be most beneficial for the future of structural biology, bridging the gaps between static and dynamic features of proteins and their functions.
Collapse
Affiliation(s)
- Kacper Szczepski
- Biological and Environmental Science & Engineering (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Łukasz Jaremko
- Biological and Environmental Science & Engineering (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
15
|
Kotowski K, Roterman I, Stapor K. DisorderUnetLM: Validating ProteinUnet for efficient protein intrinsic disorder prediction. Comput Biol Med 2025; 185:109586. [PMID: 39708500 DOI: 10.1016/j.compbiomed.2024.109586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Revised: 12/03/2024] [Accepted: 12/14/2024] [Indexed: 12/23/2024]
Abstract
The prediction of intrinsic disorder regions has significant implications for understanding protein functions and dynamics. It can help to discover novel protein-protein interactions essential for designing new drugs and enzymes. Recently, a new generation of predictors based on protein language models (pLMs) is emerging. These algorithms reach state-of-the-art accuracy without calculating time-consuming multiple sequence alignments (MSAs). This article introduces the new DisorderUnetLM disorder predictor, which builds upon the idea of ProteinUnet. It uses the Attention U-Net convolutional network and incorporates features from the ProtTrans pLM. DisorderUnetLM achieves top results in the direct comparison with recent predictors exploiting MSAs and pLMs. Moreover, among 43 predictors on the latest CAID-2 benchmark, it ranks 1st for the NOX subset in terms of the ROC-AUC metric (0.844) and 2nd for the AP metric (0.596). For the CAID-2 PDB subset, it ranks in the top 10 (ROC-AUC of 0.924 and AP of 0.862). The code and model are publicly available and fully reproducible at doi.org/10.24433/CO.7350682.v1.
Collapse
Affiliation(s)
- Krzysztof Kotowski
- Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Medyczna 7, 30-688, Kraków, Poland
| | - Katarzyna Stapor
- Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland.
| |
Collapse
|
16
|
Pir MS, Timucin E. AFFIPred: AlphaFold2 structure-based Functional Impact Prediction of missense variations. Protein Sci 2025; 34:e70030. [PMID: 39840793 PMCID: PMC11751861 DOI: 10.1002/pro.70030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 01/23/2025]
Abstract
Protein structure holds immense potential for pathogenicity prediction, albeit structure-based predictors are limited compared to the sequence-based counterparts due to the "structure knowledge gap" between large number of available protein sequences and relatively limited number of structures. Leveraging the highly accurate protein structures predicted by AlphaFold2 (AF2), we introduce AFFIPred, an ensemble machine learning classifier that combines sequence and AF2-based structural characteristics to predict missense variant pathogenicity. Based on the assessments on unseen datasets, AFFIPred reached a comparable level of performance with the state-of-the-art predictors such as AlphaMissense. We also showed that the recruitment of AF2 structures that are full-length and represent the unbound states ensures more precise SASA calculations compared to the recruitment of experimental structures. In line with the completeness of the AF2 structures, their use provide a more comprehensive view of the structural characteristics of the missense variation datasets by capturing all variants. AFFIPred maintains high-level accuracy without the limitations of PDB-based classifiers. AFFIPred has predicted over 210 million variations of the human proteome, which are accessible at https://affipred.timucinlab.com/.
Collapse
Affiliation(s)
- Mustafa S Pir
- Department of Biostatistics and Bioinformatics, Institute of Health SciencesAcibadem UniversityAtasehirIstanbulTurkey
| | - Emel Timucin
- Department of Biostatistics and Bioinformatics, Institute of Health SciencesAcibadem UniversityAtasehirIstanbulTurkey
- Department of Biostatistics and Medical Informatics, School of MedicineAcibadem UniversityAtasehirIstanbulTurkey
| |
Collapse
|
17
|
Yang W, Du Q, Zhou X, Wu C, Bao J. PDFll: Predictors of Disorder and Function of Proteins from the Language of Life. J Comput Biol 2025; 32:143-155. [PMID: 39246251 DOI: 10.1089/cmb.2024.0506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2024] Open
Abstract
The identification of intrinsically disordered proteins and their functional roles is largely dependent on the performance of computational predictors, necessitating a high standard of accuracy in these tools. In this context, we introduce a novel series of computational predictors, termed PDFll (Predictors of Disorder and Function of proteins from the Language of Life), which are designed to offer precise predictions of protein disorder and associated functional roles based on protein sequences. PDFll is developed through a two-step process. Initially, it leverages large-scale protein language models (pLMs), trained on an extensive dataset comprising billions of protein sequences. Subsequently, the embeddings derived from pLMs are integrated into streamlined, yet sophisticated, deep-learning models to generate predictions. These predictions notably surpass the performance of existing state-of-the-art predictors, particularly those that forecast disorder and function without utilizing evolutionary information.
Collapse
Affiliation(s)
- Wanyi Yang
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Qingsong Du
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Xunyu Zhou
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Chuanfang Wu
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Jinku Bao
- College of Life Sciences, Sichuan University, Chengdu, China
| |
Collapse
|
18
|
Helble M, Zhu X, Bhojnagarwala PS, Liaw K, Gao Y, Kim A, Bayruns K, McCanna ME, Park J, Konrath KM, Garfinkle S, Brysgel T, Weiner DB, Kulp DW. Structural engineering of stabilized, expanded epitope nanoparticle vaccines for HPV. Front Immunol 2025; 16:1535261. [PMID: 39958352 PMCID: PMC11826081 DOI: 10.3389/fimmu.2025.1535261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 01/07/2025] [Indexed: 02/18/2025] Open
Abstract
Oncogenic forms of HPV account for 4.5% of the global cancer burden worldwide. This includes cervical, vaginal, vulvar, penile, and anal cancers, as well as head and neck cancers. As such, there is an urgent need to develop effective therapeutic vaccines to drive the immune system's cellular response against cancer cells. One of the primary goals of cancer vaccination is to increase the potency and diversity of anti-tumor T-cell responses; one strategy to do so involves the delivery of full-length cancer antigens scaffolded onto DNA-launched nanoparticles to improve T-cell priming. We developed a platform, making use of structural prediction algorithms such as AlphaFold2, to design stabilized, more full-length antigens of relevant HPV proteins and then display them on nanoparticles. We demonstrated that many such designs for both the HPV16 E6 and E7 antigens assembled and drove strong CD8+ T-cell responses in mice. We further tested nanoparticles in a genetically diverse, more translationally relevant CD-1 mouse model and demonstrated that both E6 and E7 nanoparticle designs drove a CD8+ biased T-cell response. These findings serve as a proof-of-concept study for nanoparticle antigen design as well as identify new vaccine candidates for HPV-associated cancers.
Collapse
Affiliation(s)
- Michaela Helble
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Xizhou Zhu
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | | | - Kevin Liaw
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | - Yangcheng Gao
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | - Amber Kim
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | - Kelly Bayruns
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | - Madison E. McCanna
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | - Joyce Park
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
| | - Kylie M. Konrath
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Sam Garfinkle
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Taylor Brysgel
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - David B. Weiner
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Daniel W. Kulp
- The Vaccine and Immunotherapy Center, The Wistar Institute, Philadelphia, PA, United States
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
19
|
Long W, Zhao L, Yang H, Yang X, Bai Y, Xue X, Wang D, Han S. Genome-Wide Characterization of Wholly Disordered Proteins in Arabidopsis. Int J Mol Sci 2025; 26:1117. [PMID: 39940886 PMCID: PMC11817481 DOI: 10.3390/ijms26031117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 01/25/2025] [Accepted: 01/26/2025] [Indexed: 02/16/2025] Open
Abstract
Intrinsically disordered proteins (IDPs) include two types of proteins: partial disordered regions (IDRs) and wholly disordered proteins (WDPs). Extensive studies focused on the proteins with IDRs, but less is known about WDPs because of their difficult-to-form folded tertiary structure. In this study, we developed a bioinformatics method for screening more than 50 amino acids in the genome level and found a total of 27 categories, including 56 WDPs, in Arabidopsis. After comparing with 56 randomly selected structural proteins, we found that WDPs possessed a more wide range of theoretical isoelectric point (PI), a more negative of Grand Average of Hydropathicity (GRAVY), a higher value of Instability Index (II), and lower values of Aliphatic Index (AI). In addition, by calculating the FCR (fraction of charged residue) and NCPR (net charge per residue) values of each WDP, we found 20 WDPs in R1 (FCR < 0.25 and NCPR < 0.25) group, 15 in R2 (0.25 ≤ FCR ≤ 0.35 and NCPR ≤ 0.35), 19 in R3 (FCR > 0.35 and NCPR ≤ 0.35), and two in R4 (FCR > 0.35 and NCPR > 0.35). Moreover, the gene expression and protein-protein interaction (PPI) network analysis showed that WDPs perform different biological functions. We also showed that two WDPs, SIS (Salt Induced Serine rich) and RAB18 (a dehydrin family protein), undergo the in vitro liquid-liquid phase separation (LLPS). Therefore, our results provide insight into understanding the biochemical characters and biological functions of WDPs in plants.
Collapse
Affiliation(s)
- Wenfen Long
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Liang Zhao
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Huimin Yang
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Xinyi Yang
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Yulong Bai
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Xiuhua Xue
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Doudou Wang
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
| | - Shengcheng Han
- Beijing Key Laboratory of Gene Resources and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China; (W.L.); (L.Z.); (H.Y.); (X.Y.); (Y.B.); (X.X.)
- Academy of Plateau Science and Sustainability of the People’s Government of Qinghai Province & Beijing Normal University, Qinghai Normal University, Xining 810008, China
| |
Collapse
|
20
|
Han KS, Song SR, Pak MH, Kim CS, Ri CP, Del Conte A, Piovesan D. PredIDR: Accurate prediction of protein intrinsic disorder regions using deep convolutional neural network. Int J Biol Macromol 2025; 284:137665. [PMID: 39571839 DOI: 10.1016/j.ijbiomac.2024.137665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/29/2024] [Accepted: 11/13/2024] [Indexed: 12/02/2024]
Abstract
The involvement of protein intrinsic disorder in essential biological processes, it is well known in structural biology. However, experimental methods for detecting intrinsic structural disorder and directly measuring highly dynamic behavior of protein structure are limited. To address this issue, several computational methods to predict intrinsic disorder from protein sequences were developed and their performance is evaluated by the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we describe a new computational method, PredIDR, which provides accurate prediction of intrinsically disordered regions in proteins, mimicking experimental X-ray missing residues. Indeed, missing residues in Protein Data Bank (PDB) were used as positive examples to train a deep convolutional neural network which produces two types of output for short and long regions. PredIDR took part in the second round of CAID and was as accurate as the top state-of-the-art IDR prediction methods. PredIDR can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
Collapse
Affiliation(s)
- Kun-Sop Han
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Se-Ryong Song
- Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Myong-Hyon Pak
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Song Kim
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Chol-Pyok Ri
- University of Sciences, Pyongyang, Democratic People's Republic of Korea
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
21
|
Rackovsky S. Techniques for Bioinformatic Applications in Protein Dynamics. Methods Mol Biol 2025; 2870:221-226. [PMID: 39543037 DOI: 10.1007/978-1-0716-4213-9_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
A method is described by which bioinformatic concepts and tools can be applied to the study of protein dynamic properties. Sequences are transformed into numerical strings by representing each amino acid by a residue specific average value of the crystallographic alpha carbon B factor. These dynamic sequences are then Fourier transformed. The Fourier coefficients, each of which contains information about the entire sequence, viewed on a specific length scale, can then be used to study a wide variety of dynamic characteristics in a manner which is completely inaccessible using conventional tools.
Collapse
Affiliation(s)
- Shalom Rackovsky
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA.
| |
Collapse
|
22
|
Song J, Kurgan L. Two decades of advances in sequence-based prediction of MoRFs, disorder-to-order transitioning binding regions. Expert Rev Proteomics 2025; 22:1-9. [PMID: 39789785 DOI: 10.1080/14789450.2025.2451715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/20/2024] [Accepted: 12/26/2024] [Indexed: 01/12/2025]
Abstract
INTRODUCTION Molecular recognition features (MoRFs) are regions in protein sequences that undergo induced folding upon binding partner molecules. MoRFs are common in nature and can be predicted from sequences based on their distinctive sequence signatures. AREAS COVERED We overview 20 years of progress in the sequence-based prediction of MoRFs which resulted in the development of 25 predictors of MoRFs that interact with proteins, peptides, and lipids. These methods range from simple discriminant analysis to sophisticated deep transformer networks that use protein language models. They generate relatively accurate predictions as evidenced by the results of a recently published community-driven assessment. EXPERT OPINION MoRFs prediction is a mature field of research that is poised to continue at a steady pace in the foreseeable future. We anticipate further expansion of the scope of MoRF predictions to additional partner molecules, such as nucleic acids, and continued use of recent machine learning advances. Other future efforts should concentrate on improving availability of MoRF predictions by releasing, maintaining, and popularizing web servers and by depositing MoRF predictions to large databases of protein structure and function predictions. Furthermore, accurate MoRF predictions should be coupled with the equally accurate prediction and modeling of the resulting structures of complexes.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
23
|
Zhao B, Basu S, Kurgan L. DescribePROT Database of Residue-Level Protein Structure and Function Annotations. Methods Mol Biol 2025; 2867:169-184. [PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
DescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data. Moreover, we briefly discuss plans for a future expansion of this database. DescribePROT is available at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ .
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
24
|
Wang K, Hu G, Wu Z, Kurgan L. Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn. Methods Mol Biol 2025; 2867:201-218. [PMID: 39576583 DOI: 10.1007/978-1-0716-4196-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
25
|
Malhis N, Gsponer J. Computational Prediction of Linear Interacting Peptides. Methods Mol Biol 2025; 2867:233-245. [PMID: 39576585 DOI: 10.1007/978-1-0716-4196-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Intrinsically disordered protein regions, IDRs, are observed in many eukaryotic proteins. They play critical roles in essentially all cellular processes because segments of these regions, known as linear interacting peptides (LIPs), are heavily involved in regulatory protein interactions across proteomes. This chapter presents an integrated summary of the results from the last two Critical Assessments of protein Intrinsic Disorder predictions, known as CAID events, on the computational prediction of LIP segments. Because the CAID community questioned the quality of the test dataset used by the first CAID event, we reannotated this dataset using more accurate annotations from the latest DisProt database release. Then, we compared the results of the first CAID with the updated data and the results of the second CAID event. Our comparison highlights the importance of data annotation on the evaluation outcome and provides recommendations for users of LIP predictors.
Collapse
Affiliation(s)
- Nawar Malhis
- Michael Smith Laboratories, the University of British Columbia, Vancouver, BC, Canada
| | - Jörg Gsponer
- Michael Smith Laboratories, the University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
26
|
Zhang F, Kurgan L. Evaluation of predictions of disordered binding regions in the CAID2 experiment. Comput Struct Biotechnol J 2024; 27:78-88. [PMID: 39811792 PMCID: PMC11732247 DOI: 10.1016/j.csbj.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 12/12/2024] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open
Abstract
A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2's evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.
Collapse
Affiliation(s)
- Fuhao Zhang
- College of Information Engineering, Northwest A & F University, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
27
|
Antonietti M, Kim CK, Djulbegovic MB, Gonzalez DJT, Greenfield JA, Uversky VN, Gibbons AG, Karp CL. Effects of Aging on Intrinsic Protein Disorder in Human Lenses and Zonules. Cell Biochem Biophys 2024; 82:3667-3679. [PMID: 39117985 PMCID: PMC11576620 DOI: 10.1007/s12013-024-01455-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2024] [Indexed: 08/10/2024]
Abstract
This study aims to compare the levels of intrinsic protein disorder within the human lens and zonule proteomes and investigate the role of aging as a potential influencing factor on disorder levels. A cross-sectional proteomic analysis was employed, utilizing a dataset of 1466 proteins derived from the lens and zonule proteomes previously published by Wang et al. and De Maria et al. Bioinformatics tools, including a composition profiler and a rapid intrinsic disorder analysis online tool, were used to conduct a comparative analysis of protein disorder. Statistical tests such as ANOVA, Tukey's HSD, and chi-squared tests were applied to evaluate differences between groups. The study revealed distinct amino acid compositions for each proteome, showing a direct correlation between aging and increased protein disorder in the zonular proteomes, whereas the lens proteomes exhibited the opposite trend. Findings suggest that age-related changes in intrinsic protein disorder within the lens and zonule proteomes may be linked to structural transformations in these tissues. Understanding how protein disorder evolves with age could enhance knowledge of the molecular basis for age-related conditions such as cataracts and pseudoexfoliation, potentially leading to better therapeutic strategies.
Collapse
Affiliation(s)
| | - Colin K Kim
- Bascom Palmer Eye Institute, University of Miami, Miami, FL, USA
| | - Mak B Djulbegovic
- Wills Eye Hospital, Thomas Jefferson University Hospital, Philadelphia, PA, USA
| | | | | | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | | | - Carol L Karp
- Bascom Palmer Eye Institute, University of Miami, Miami, FL, USA.
| |
Collapse
|
28
|
Basu S, Kurgan L. Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses. Comput Struct Biotechnol J 2024; 23:1968-1977. [PMID: 38765610 PMCID: PMC11098722 DOI: 10.1016/j.csbj.2024.04.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
Intrinsic disorder predictors were evaluated in several studies including the two large CAID experiments. However, these studies are biased towards eukaryotic proteins and focus primarily on the residue-level predictions. We provide first-of-its-kind assessment that comprehensively covers the taxonomy and evaluates predictions at the residue and disordered region levels. We curate a benchmark dataset that uniformly covers eukaryotic, archaeal, bacterial, and viral proteins. We find that predictive performance differs substantially across taxonomy, where viruses are predicted most accurately, followed by protists and higher eukaryotes, while bacterial and archaeal proteins suffer lower levels of accuracy. These trends are consistent across predictors. We also find that current tools, except for flDPnn, struggle with reproducing native distributions of the numbers and sizes of the disordered regions. Moreover, analysis of two variants of disorder predictions derived from the AlphaFold2 predicted structures reveals that they produce accurate residue-level propensities for archaea, bacteria and protists. However, they underperform for higher eukaryotes and generally struggle to accurately identify disordered regions. Our results motivate development of new predictors that target bacteria and archaea and which produce accurate results at both residue and region levels. We also stress the need to include the region-level assessments in future assessments.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
29
|
Erdős G, Dosztányi Z. Deep learning for intrinsically disordered proteins: From improved predictions to deciphering conformational ensembles. Curr Opin Struct Biol 2024; 89:102950. [PMID: 39522439 DOI: 10.1016/j.sbi.2024.102950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/19/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024]
Abstract
Intrinsically disordered proteins (IDPs) lack a stable three-dimensional structure under physiological conditions, challenging traditional structure-based prediction methods. This review explores how modern deep learning approaches, which have revolutionized structure prediction for globular proteins, have impacted protein disorder predictions. We highlight the role of community-driven efforts in curating data and assessing state-of-the-art, which have been crucial in advancing the field. We also review state-of-the-art methods utilizing deep learning techniques, highlighting innovative approaches. We also address advancements in characterizing protein conformational ensembles directly from sequence data using novel machine learning methods.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| |
Collapse
|
30
|
Soh TK, Ognibene S, Sanders S, Schäper R, Kaufer BB, Bosse JB. A proteome-wide structural systems approach reveals insights into protein families of all human herpesviruses. Nat Commun 2024; 15:10230. [PMID: 39592652 PMCID: PMC11599850 DOI: 10.1038/s41467-024-54668-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 11/19/2024] [Indexed: 11/28/2024] Open
Abstract
Structure predictions have become invaluable tools, but viral proteins are absent from the EMBL/DeepMind AlphaFold database. Here, we provide proteome-wide structure predictions for all nine human herpesviruses and analyze them in depth with explicit scoring thresholds. By clustering these predictions into structural similarity groups, we identified new families, such as the HCMV UL112-113 cluster, which is conserved in alpha- and betaherpesviruses. A domain-level search found protein families consisting of subgroups with varying numbers of duplicated folds. Using large-scale structural similarity searches, we identified viral proteins with cellular folds, such as the HSV-1 US2 cluster possessing dihydrofolate reductase folds and the EBV BMRF2 cluster that might have emerged from cellular equilibrative nucleoside transporters. Our HerpesFolds database is available at https://www.herpesfolds.org/herpesfolds and displays all models and clusters through an interactive web interface. Here, we show that system-wide structure predictions can reveal homology between viral species and identify potential protein functions.
Collapse
Affiliation(s)
- Timothy K Soh
- Hannover Medical School, Institute of Virology, Hanover, Germany
- Centre for Structural Systems Biology, Hamburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hanover Medical School, Hanover, Germany
- Leibniz Institute of Virology (LIV), Hamburg, Germany
| | - Sofia Ognibene
- Hannover Medical School, Institute of Virology, Hanover, Germany
- Centre for Structural Systems Biology, Hamburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hanover Medical School, Hanover, Germany
- Leibniz Institute of Virology (LIV), Hamburg, Germany
| | - Saskia Sanders
- Hannover Medical School, Institute of Virology, Hanover, Germany
- Centre for Structural Systems Biology, Hamburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hanover Medical School, Hanover, Germany
- Leibniz Institute of Virology (LIV), Hamburg, Germany
| | - Robin Schäper
- Hannover Medical School, Institute of Virology, Hanover, Germany
- Centre for Structural Systems Biology, Hamburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hanover Medical School, Hanover, Germany
- Leibniz Institute of Virology (LIV), Hamburg, Germany
| | - Benedikt B Kaufer
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
- Veterinary Centre for Resistance Research (TZR), Freie Universität Berlin, Berlin, Germany
| | - Jens B Bosse
- Hannover Medical School, Institute of Virology, Hanover, Germany.
- Centre for Structural Systems Biology, Hamburg, Germany.
- Cluster of Excellence RESIST (EXC 2155), Hanover Medical School, Hanover, Germany.
- Leibniz Institute of Virology (LIV), Hamburg, Germany.
| |
Collapse
|
31
|
Basu S, Yu J, Kihara D, Kurgan L. Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences. Brief Bioinform 2024; 26:bbaf016. [PMID: 39833102 PMCID: PMC11745544 DOI: 10.1093/bib/bbaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/24/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open
Abstract
Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Jing Yu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, 915 Mitch Daniels Boulevard, West Lafayette, IN 47907, United States
- Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| |
Collapse
|
32
|
Zhang J, Basu S, Zhang F, Kurgan L. MERIT: Accurate Prediction of Multi Ligand-binding Residues with Hybrid Deep Transformer Network, Evolutionary Couplings and Transfer Learning. J Mol Biol 2024:168872. [PMID: 40133785 DOI: 10.1016/j.jmb.2024.168872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/30/2024] [Accepted: 11/15/2024] [Indexed: 03/27/2025]
Abstract
Multi-ligand binding residues (MLBRs) are amino acids in protein sequences that interact with multiple different ligands that include proteins, peptides, nucleic acids, and a variety of small molecules. MLBRs are implicated in a number of cellular functions and targeted in a context of multiple human diseases. There are many sequence-based predictors of residues that interact with specific ligand types and they can be collectively used to identify MLBRs. However, there are no methods that directly predict MLBRs. To this end, we conceptualize, design, evaluate and release MERIT (Multi-binding rEsidues pRedIcTor). This tool relies on a custom-crafted deep neural network that implements a number of innovative features, such as a multi-layered/step architecture with transformer modules that we train using a custom-designed loss function, computation of evolutionary couplings, and application of transfer learning. These innovations boost predictive performance, which we demonstrate using an ablation analysis. In particular, they reduce the number of cross-predictions, defined as residues that interact with a single ligand type that are incorrectly predicted as MLBRs. We compare MERIT against a representative selection of current and popular ligand-specific predictors, meta-predictors that combine their results to identify MLBRs, and a baseline regression-based predictor. These tests reveal that MERIT provides accurate predictions and statistically outperforms these alternatives. Moreover, using two test datasets, one with MLBRs and another with only the single ligand binding residues, we show that MERIT consistently produces relatively low false positive rates, including low rates of cross-predictions. The web server and datasets from this study are freely available at http://biomine.cs.vcu.edu/servers/MERIT/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Fuhao Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
33
|
Malebary SJ, Alromema N. iDLB-Pred: identification of disordered lipid binding residues in protein sequences using convolutional neural network. Sci Rep 2024; 14:24724. [PMID: 39433833 PMCID: PMC11494137 DOI: 10.1038/s41598-024-75700-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 10/08/2024] [Indexed: 10/23/2024] Open
Abstract
Proteins, nucleic acids, and lipids all interact with intrinsically disordered protein areas. Lipid-binding regions are involved in a variety of biological processes as well as a number of human illnesses. The expanding body of experimental evidence for these interactions and the dearth of techniques to anticipate them from the protein sequence serve as driving forces. Although large-scale laboratory techniques are considered to be essential for equipment for studying binding residues, they are time consuming and costly, making it challenging for researchers to predict lipid binding residues. As a result, computational techniques are being looked at as a different strategy to overcome this difficulty. To predict disordered lipid-binding residues (DLBRs), we proposed iDLB-Pred predictor utilizing benchmark dataset to compute feature through extraction techniques to identify relevant patterns and information. Various classification techniques, including deep learning methods such as Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Multilayer Perceptrons (MLPs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs), were employed for model training. The proposed model, iDLB-Pred, was rigorously validated using metrics such as accuracy, sensitivity, specificity, and Matthew's correlation coefficient. The results demonstrate the predictor's exceptional performance, achieving accuracy rates of 81% on an independent dataset and 86% in 10-fold cross-validation.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia.
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia
| |
Collapse
|
34
|
Chow CFW, Ghosh S, Hadarovich A, Toth-Petroczy A. SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences. Proc Natl Acad Sci U S A 2024; 121:e2401622121. [PMID: 39383002 PMCID: PMC11494347 DOI: 10.1073/pnas.2401622121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 08/30/2024] [Indexed: 10/11/2024] Open
Abstract
Intrinsically disordered regions (IDRs) are structurally flexible protein segments with regulatory functions in multiple contexts, such as in the assembly of biomolecular condensates. Since IDRs undergo more rapid evolution than ordered regions, identifying homology of such poorly conserved regions remains challenging for state-of-the-art alignment-based methods that rely on position-specific conservation of residues. Thus, systematic functional annotation and evolutionary analysis of IDRs have been limited, despite them comprising ~21% of proteins. To accurately assess homology between unalignable sequences, we developed an alignment-free sequence comparison algorithm, SHARK (Similarity/Homology Assessment by Relating K-mers). We trained SHARK-dive, a machine learning homology classifier, which achieved superior performance to standard alignment-based approaches in assessing evolutionary homology in unalignable sequences. Furthermore, it correctly identified dissimilar but functionally analogous IDRs in IDR-replacement experiments reported in the literature, whereas alignment-based tools were incapable of detecting such functional relationships. SHARK-dive not only predicts functionally similar IDRs at a proteome-wide scale but also identifies cryptic sequence properties and motifs that drive remote homology and analogy, thereby providing interpretable and experimentally verifiable hypotheses of the sequence determinants that underlie such relationships. SHARK-dive acts as an alternative to alignment to facilitate systematic analysis and functional annotation of the unalignable protein universe.
Collapse
Affiliation(s)
- Chi Fung Willis Chow
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
- Cluster of Excellence Physics of Life, Technische Universität Dresden, Dresden01062, Germany
| | - Soumyadeep Ghosh
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
| | - Anna Hadarovich
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
- Cluster of Excellence Physics of Life, Technische Universität Dresden, Dresden01062, Germany
| |
Collapse
|
35
|
Kombo DC, LaMarche MJ, Konkankit CC, Rackovsky S. Application of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences. Proteins 2024; 92:1234-1241. [PMID: 38808365 PMCID: PMC11511649 DOI: 10.1002/prot.26704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/07/2024] [Accepted: 05/13/2024] [Indexed: 05/30/2024]
Abstract
We apply methods of Artificial Intelligence and Machine Learning to protein dynamic bioinformatics. We rewrite the sequences of a large protein data set, containing both folded and intrinsically disordered molecules, using a representation developed previously, which encodes the intrinsic dynamic properties of the naturally occurring amino acids. We Fourier analyze the resulting sequences. It is demonstrated that classification models built using several different supervised learning methods are able to successfully distinguish folded from intrinsically disordered proteins from sequence alone. It is further shown that the most important sequence property for this discrimination is the sequence mobility, which is the sequence averaged value of the residue-specific average alpha carbon B factor. This is in agreement with previous work, in which we have demonstrated the central role played by the sequence mobility in protein dynamic bioinformatics and biophysics. This finding opens a path to the application of dynamic bioinformatics, in combination with machine learning algorithms, to a range of significant biomedical problems.
Collapse
Affiliation(s)
- David C. Kombo
- Dept. of Medicinal Chemistry, Integrated Drug Discovery, Sanofi 350 Water St., Cambridge, MA 02141
| | - Matthew J. LaMarche
- Dept. of Medicinal Chemistry, Integrated Drug Discovery, Sanofi 350 Water St., Cambridge, MA 02141
| | - Chilaluck C. Konkankit
- Dept. of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, NY 14853
| | - S. Rackovsky
- Dept. of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, NY 14853
| |
Collapse
|
36
|
Ren Y, Liao H, Yan J, Lu H, Mao X, Wang C, Li YF, Liu Y, Chen C, Chen L, Wang X, Zhou KY, Liu HM, Liu Y, Hua YM, Yu L, Xue Z. Capture of RNA-binding proteins across mouse tissues using HARD-AP. Nat Commun 2024; 15:8421. [PMID: 39341811 PMCID: PMC11438895 DOI: 10.1038/s41467-024-52765-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 09/20/2024] [Indexed: 10/01/2024] Open
Abstract
RNA-binding proteins (RBPs) modulate all aspects of RNA metabolism, but a comprehensive picture of RBP expression across tissues is lacking. Here, we describe our development of the method we call HARD-AP that robustly retrieves RBPs and tightly associated RNA regulatory complexes from cultured cells and fresh tissues. We successfully use HARD-AP to establish a comprehensive atlas of RBPs across mouse primary organs. We then systematically map RNA-binding sites of these RBPs using machine learning-based modeling. Notably, the modeling reveals that the LIM domain as an RNA-binding domain in many RBPs. We validate the LIM-domain-only protein Csrp1 as a tissue-dependent RNA binding protein. Taken together, HARD-AP is a powerful approach that can be used to identify RBPomes from any type of sample, allowing comprehensive and physiologically relevant networks of RNA-protein interactions.
Collapse
Affiliation(s)
- Yijia Ren
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Hongyu Liao
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Jun Yan
- National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Hongyu Lu
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiaowei Mao
- Sichuan Provincial Key Laboratory for Human Disease Gene Study and the Center for Medical Genetics, Department of Laboratory Medicine, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610072, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
- Shimmer Center, Tianfu Jiangxi Laboratory, Chengdu, Sichuan, 641419, China
| | - Chuan Wang
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yi-Fei Li
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yu Liu
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Chong Chen
- Department of Urology, Institute of Urology, State Key Laboratory of Biotherapy and Cancer Center, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Lu Chen
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiangfeng Wang
- National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Kai-Yu Zhou
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Han-Min Liu
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yi Liu
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Yi-Min Hua
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China.
| | - Lin Yu
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China.
| | - Zhihong Xue
- Key Laboratory of Birth Defects and Related Disease of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, Sichuan, 610041, China.
- Development and Related Diseases of Women and Children Key Laboratory of Sichuan Province, Chengdu, Sichuan, 610041, China.
| |
Collapse
|
37
|
Yu Z, Ran G, Chai J, Zhang EE. A nature-inspired HIF stabilizer derived from a highland-adaptation insertion of plateau pika Epas1 protein. Cell Rep 2024; 43:114727. [PMID: 39269902 DOI: 10.1016/j.celrep.2024.114727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/06/2024] [Accepted: 08/22/2024] [Indexed: 09/15/2024] Open
Abstract
Hypoxia-inducible factors (HIFs) play pivotal roles in numerous diseases and high-altitude adaptation, and HIF stabilizers have emerged as valuable therapeutic tools. In our prior investigation, we identified a highland-adaptation 24-amino-acid insertion within the Epas1 protein. This insertion enhances the protein stability of Epas1, and mice engineered with this insertion display enhanced resilience to hypoxic conditions. In the current study, we delved into the biochemical mechanisms underlying the protein-stabilizing effects of this insertion. Our findings unveiled that the last 11 amino acids within this insertion adopt a helical conformation and interact with the α-domain of the von Hippel-Lindau tumor suppressor protein (pVHL), thereby disrupting the Eloc-pVHL interaction and impeding the ubiquitination of Epas1. Utilizing a synthesized peptide, E14-24, we demonstrated its favorable membrane permeability and ability to stabilize endogenous HIF-α proteins, inducing the expression of hypoxia-responsive element (HRE) genes. Furthermore, the administration of E14-24 to mice subjected to hypoxic conditions mitigated body weight loss, suggesting its potential to enhance hypoxia adaptation.
Collapse
Affiliation(s)
- Ziqing Yu
- Graduate School of Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing 100006, China; National Institute of Biological Sciences, Beijing 102206, China.
| | - Guangdi Ran
- National Institute of Biological Sciences, Beijing 102206, China; Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Juan Chai
- National Institute of Biological Sciences, Beijing 102206, China
| | - Eric Erquan Zhang
- National Institute of Biological Sciences, Beijing 102206, China; Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China.
| |
Collapse
|
38
|
Heredia-Torrejón M, Montañez R, González-Meneses A, Carcavilla A, Medina MA, Lechuga-Sancho AM. VUS next in rare diseases? Deciphering genetic determinants of biomolecular condensation. Orphanet J Rare Dis 2024; 19:327. [PMID: 39243101 PMCID: PMC11380411 DOI: 10.1186/s13023-024-03307-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 08/06/2024] [Indexed: 09/09/2024] Open
Abstract
The diagnostic odysseys for rare disease patients are getting shorter as next-generation sequencing becomes more widespread. However, the complex genetic diversity and factors influencing expressivity continue to challenge accurate diagnosis, leaving more than 50% of genetic variants categorized as variants of uncertain significance.Genomic expression intricately hinges on localized interactions among its products. Conventional variant prioritization, biased towards known disease genes and the structure-function paradigm, overlooks the potential impact of variants shaping the composition, location, size, and properties of biomolecular condensates, genuine membraneless organelles swiftly sensing and responding to environmental changes, and modulating expressivity.To address this complexity, we propose to focus on the nexus of genetic variants within biomolecular condensates determinants. Scrutinizing variant effects in these membraneless organelles could refine prioritization, enhance diagnostics, and unveil the molecular underpinnings of rare diseases. Integrating comprehensive genome sequencing, transcriptomics, and computational models can unravel variant pathogenicity and disease mechanisms, enabling precision medicine. This paper presents the rationale driving our proposal and describes a protocol to implement this approach. By fusing state-of-the-art knowledge and methodologies into the clinical practice, we aim to redefine rare diseases diagnosis, leveraging the power of scientific advancement for more informed medical decisions.
Collapse
Affiliation(s)
- María Heredia-Torrejón
- Inflammation, Nutrition, Metabolism and Oxidative Stress Research Laboratory, Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain
- Mother and Child Health and Radiology Department. Area of Clinical Genetics, University of Cadiz. Faculty of Medicine, Cadiz, Spain
| | - Raúl Montañez
- Inflammation, Nutrition, Metabolism and Oxidative Stress Research Laboratory, Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain.
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucía Tech, E-29071, Málaga, Spain.
| | - Antonio González-Meneses
- Division of Dysmorphology, Department of Paediatrics, Virgen del Rocio University Hospital, Sevilla, Spain
- Department of Paediatrics, Medical School, University of Sevilla, Sevilla, Spain
| | - Atilano Carcavilla
- Pediatric Endocrinology Department, Hospital Universitario La Paz, 28046, Madrid, Spain
- Multidisciplinary Unit for RASopathies, Hospital Universitario La Paz, 28046, Madrid, Spain
| | - Miguel A Medina
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucía Tech, E-29071, Málaga, Spain.
- Biomedical Research Institute and nanomedicine platform of Málaga IBIMA-BIONAND, E-29071, Málaga, Spain.
- CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, E-28029, Madrid, Spain.
| | - Alfonso M Lechuga-Sancho
- Inflammation, Nutrition, Metabolism and Oxidative Stress Research Laboratory, Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain
- Division of Endocrinology, Department of Paediatrics, Puerta del Mar University Hospital, Cádiz, Spain
- Area of Paediatrics, Department of Child and Mother Health and Radiology, Medical School, University of Cadiz, Cadiz, Spain
| |
Collapse
|
39
|
Lin L, Huang Y, McIntyre J, Chang CH, Colmenares S, Lee YCG. Prevalent Fast Evolution of Genes Involved in Heterochromatin Functions. Mol Biol Evol 2024; 41:msae181. [PMID: 39189646 PMCID: PMC11408610 DOI: 10.1093/molbev/msae181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 08/14/2024] [Accepted: 08/20/2024] [Indexed: 08/28/2024] Open
Abstract
Heterochromatin is a gene-poor and repeat-rich genomic compartment universally found in eukaryotes. Despite its low transcriptional activity, heterochromatin plays important roles in maintaining genome stability, organizing chromosomes, and suppressing transposable elements. Given the importance of these functions, it is expected that genes involved in heterochromatin regulation would be highly conserved. Yet, a handful of these genes were found to evolve rapidly. To investigate whether these previous findings are anecdotal or general to genes modulating heterochromatin, we compile an exhaustive list of 106 candidate genes involved in heterochromatin functions and investigate their evolution over short and long evolutionary time scales in Drosophila. Our analyses find that these genes exhibit significantly more frequent evolutionary changes, both in the forms of amino acid substitutions and gene copy number change, when compared to genes involved in Polycomb-based repressive chromatin. While positive selection drives amino acid changes within both structured domains with diverse functions and intrinsically disordered regions, purifying selection may have maintained the proportions of intrinsically disordered regions of these proteins. Together with the observed negative associations between the evolutionary rate of these genes and the genomic abundance of transposable elements, we propose an evolutionary model where the fast evolution of genes involved in heterochromatin functions is an inevitable outcome of the unique functional roles of heterochromatin, while the rapid evolution of transposable elements may be an effect rather than cause. Our study provides an important global view of the evolution of genes involved in this critical cellular domain and provides insights into the factors driving the distinctive evolution of heterochromatin.
Collapse
Affiliation(s)
- Leila Lin
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Yuheng Huang
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Jennifer McIntyre
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Ching-Ho Chang
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Serafin Colmenares
- Department of Cell and Molecular Biology, University of California, Berkeley, CA, USA
| | - Yuh Chwen G Lee
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| |
Collapse
|
40
|
Wang K, Hu G, Basu S, Kurgan L. flDPnn2: Accurate and Fast Predictor of Intrinsic Disorder in Proteins. J Mol Biol 2024; 436:168605. [PMID: 39237195 DOI: 10.1016/j.jmb.2024.168605] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/16/2024] [Accepted: 05/04/2024] [Indexed: 09/07/2024]
Abstract
Prediction of the intrinsic disorder in protein sequences is an active research area, with well over 100 predictors that were released to date. These efforts are motivated by the functional importance and high levels of abundance of intrinsic disorder, combined with relatively low amounts of experimental annotations. The disorder predictors are periodically evaluated by independent assessors in the Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiments. The recently completed CAID2 experiment assessed close to 40 state-of-the-art methods demonstrating that some of them produce accurate results. In particular, flDPnn2 method, which is the successor of flDPnn that performed well in the CAID1 experiment, secured the overall most accurate results on the Disorder-NOX dataset in CAID2. flDPnn2 implements a number of improvements when compared to its predecessor including changes to the inputs, increased size of the deep network model that we retrained on a larger training set, and addition of an alignment module. Using results from CAID2, we show that flDPnn2 produces accurate predictions very quickly, modestly improving over the accuracy of flDPnn and reducing the runtime by half, to about 27 s per protein. flDPnn2 is freely available as a convenient web server at http://biomine.cs.vcu.edu/servers/flDPnn2/.
Collapse
Affiliation(s)
- Kui Wang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
41
|
Young VL, McSweeney AM, Edwards MJ, Ward VK. The Disorderly Nature of Caliciviruses. Viruses 2024; 16:1324. [PMID: 39205298 PMCID: PMC11360831 DOI: 10.3390/v16081324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 08/07/2024] [Accepted: 08/17/2024] [Indexed: 09/04/2024] Open
Abstract
An intrinsically disordered protein (IDP) or region (IDR) lacks or has little protein structure but still maintains function. This lack of structure creates flexibility and fluidity, allowing multiple protein conformations and potentially transient interactions with more than one partner. Caliciviruses are positive-sense ssRNA viruses, containing a relatively small genome of 7.6-8.6 kb and have a broad host range. Many viral proteins are known to contain IDRs, which benefit smaller viral genomes by expanding the functional proteome through the multifunctional nature of the IDR. The percentage of intrinsically disordered residues within the total proteome for each calicivirus type species can range between 8 and 23%, and IDRs have been experimentally identified in NS1-2, VPg and RdRP proteins. The IDRs within a protein are not well conserved across the genera, and whether this correlates to different activities or increased tolerance to mutations, driving virus adaptation to new selection pressures, is unknown. The function of norovirus NS1-2 has not yet been fully elucidated but includes involvement in host cell tropism, the promotion of viral spread and the suppression of host interferon-λ responses. These functions and the presence of host cell-like linear motifs that interact with host cell caspases and VAPA/B are all found or affected by the disordered region of norovirus NS1-2. The IDRs of calicivirus VPg are involved in viral transcription and translation, RNA binding, nucleotidylylation and cell cycle arrest, and the N-terminal IDR within the human norovirus RdRP could potentially drive liquid-liquid phase separation. This review identifies and summarises the IDRs of proteins within the Caliciviridae family and their importance during viral replication and subsequent host interactions.
Collapse
Affiliation(s)
| | | | | | - Vernon K. Ward
- Department of Microbiology & Immunology, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin 9054, New Zealand
| |
Collapse
|
42
|
Hong L, Hu Z, Sun S, Tang X, Wang J, Tan Q, Zheng L, Wang S, Xu S, King I, Gerstein M, Li Y. Fast, sensitive detection of protein homologs using deep dense retrieval. Nat Biotechnol 2024:10.1038/s41587-024-02353-6. [PMID: 39123049 DOI: 10.1038/s41587-024-02353-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 07/12/2024] [Indexed: 08/12/2024]
Abstract
The identification of protein homologs in large databases using conventional methods, such as protein sequence comparison, often misses remote homologs. Here, we offer an ultrafast, highly sensitive method, dense homolog retriever (DHR), for detecting homologs on the basis of a protein language model and dense retrieval techniques. Its dual-encoder architecture generates different embeddings for the same protein sequence and easily locates homologs by comparing these representations. Its alignment-free nature improves speed and the protein language model incorporates rich evolutionary and structural information within DHR embeddings. DHR achieves a >10% increase in sensitivity compared to previous methods and a >56% increase in sensitivity at the superfamily level for samples that are challenging to identify using alignment-based approaches. It is up to 22 times faster than traditional methods such as PSI-BLAST and DIAMOND and up to 28,700 times faster than HMMER. The new remote homologs exclusively found by DHR are useful for revealing connections between well-characterized proteins and improving our knowledge of protein evolution, structure and function.
Collapse
Affiliation(s)
- Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
- Shanghai AI Laboratory, Shanghai, China.
| | - Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT, USA
| | - Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
- OneAIM Ltd., Hong Kong SAR, China
| | - Qingxiong Tan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Sheng Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Sheng Xu
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
- Shanghai AI Laboratory, Shanghai, China
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, CT, USA.
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
- Shanghai AI Laboratory, Shanghai, China.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The Chinese University of Hong Kong Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
43
|
Nambiar A, Forsyth JM, Liu S, Maslov S. DR-BERT: A protein language model to annotate disordered regions. Structure 2024; 32:1260-1268.e3. [PMID: 38701796 DOI: 10.1016/j.str.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/16/2023] [Accepted: 04/08/2024] [Indexed: 05/05/2024]
Abstract
Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT's ability to use contextual information.
Collapse
Affiliation(s)
- Ananthan Nambiar
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, Urbana, IL 61801, USA.
| | - John Malcolm Forsyth
- Carl R. Woese Institute for Genomic Biology, Urbana, IL 61801, USA; Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Simon Liu
- Carl R. Woese Institute for Genomic Biology, Urbana, IL 61801, USA; Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Sergei Maslov
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, Urbana, IL 61801, USA; Department of Physics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA.
| |
Collapse
|
44
|
Masuda A, Okamoto T, Kawachi T, Takeda JI, Hamaguchi T, Ohno K. Blending and separating dynamics of RNA-binding proteins develop architectural splicing networks spreading throughout the nucleus. Mol Cell 2024; 84:2949-2965.e10. [PMID: 39053456 DOI: 10.1016/j.molcel.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/28/2024] [Accepted: 07/02/2024] [Indexed: 07/27/2024]
Abstract
The eukaryotic nucleus has a highly organized structure. Although the spatiotemporal arrangement of spliceosomes on nascent RNA drives splicing, the nuclear architecture that directly supports this process remains unclear. Here, we show that RNA-binding proteins (RBPs) assembled on RNA form meshworks in human and mouse cells. Core and accessory RBPs in RNA splicing make two distinct meshworks adjacently but distinctly distributed throughout the nucleus. This is achieved by mutual exclusion dynamics between the charged and uncharged intrinsically disordered regions (IDRs) of RBPs. These two types of meshworks compete for spatial occupancy on pre-mRNA to regulate splicing. Furthermore, the optogenetic enhancement of the RBP meshwork causes aberrant splicing, particularly of genes involved in neurodegeneration. Genetic mutations associated with neurodegenerative diseases are often found in the IDRs of RBPs, and cells harboring these mutations exhibit impaired meshwork formation. Our results uncovered the spatial organization of RBP networks to drive RNA splicing.
Collapse
Affiliation(s)
- Akio Masuda
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan.
| | - Takaaki Okamoto
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Toshihiko Kawachi
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Jun-Ichi Takeda
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Tomonari Hamaguchi
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Kinji Ohno
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan; Graduate School of Nutritional Sciences, Nagoya University of Arts and Sciences, Nisshin, Japan
| |
Collapse
|
45
|
Jirapongwattana N, Bunting SF, Ronning DR, Ghosal G, Karpf AR. RHNO1: at the crossroads of DNA replication stress, DNA repair, and cancer. Oncogene 2024; 43:2613-2620. [PMID: 39107463 DOI: 10.1038/s41388-024-03117-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/24/2024] [Accepted: 07/26/2024] [Indexed: 08/28/2024]
Abstract
The DNA replication stress (DRS) response is a crucial homeostatic mechanism for maintaining genome integrity in the face of intrinsic and extrinsic barriers to DNA replication. Importantly, DRS is often significantly increased in tumor cells, making tumors dependent on the cellular DRS response for growth and survival. Rad9-Hus1-Rad1 Interacting Nuclear Orphan 1 (RHNO1), a protein involved in the DRS response, has recently emerged as a potential therapeutic target in cancer. RHNO1 interacts with the 9-1-1 checkpoint clamp and TopBP1 to activate the ATR/Chk1 signaling pathway, the crucial mediator of the DRS response. Moreover, RHNO1 was also recently identified as a key facilitator of theta-mediated end joining (TMEJ), a DNA repair mechanism implicated in cancer progression and chemoresistance. In this literature review, we provide an overview of our current understanding of RHNO1, including its structure, function in the DRS response, and role in DNA repair, and discuss its potential as a cancer therapeutic target. Therapeutic targeting of RHNO1 holds promise for tumors with elevated DRS as well as tumors with DNA repair deficiencies, including homologous recombination DNA repair deficient (HRD) tumors. Further investigation into RHNO1 function in cancer, and development of approaches to target RHNO1, are expected to yield novel strategies for cancer treatment.
Collapse
Affiliation(s)
- Niphat Jirapongwattana
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA
- Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA
| | - Samuel F Bunting
- Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854-8021, USA
| | - Donald R Ronning
- Department of Pharmaceutical Sciences, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA
| | - Gargi Ghosal
- Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA
- Department of Genetics, Cell Biology, and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA
| | - Adam R Karpf
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA.
- Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, 68198-6805, USA.
| |
Collapse
|
46
|
Pastic A, Nosella ML, Kochhar A, Liu ZH, Forman-Kay JD, D'Amours D. Chromosome compaction is triggered by an autonomous DNA-binding module within condensin. Cell Rep 2024; 43:114419. [PMID: 38985672 DOI: 10.1016/j.celrep.2024.114419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 04/16/2024] [Accepted: 06/14/2024] [Indexed: 07/12/2024] Open
Abstract
The compaction of chromatin into mitotic chromosomes is essential for faithful transmission of the genome during cell division. In eukaryotes, chromosome morphogenesis is regulated by the condensin complex, though the exact mechanism used to target condensin to chromatin and initiate condensation is not understood. Here, we reveal that condensin contains an intrinsically disordered region (IDR) that modulates its association with chromatin in early mitosis and exhibits phase separation. We describe DNA-binding motifs within the IDR that, upon deletion, inflict striking defects in chromosome condensation and segregation, ill-timed condensin turnover on chromatin, and cell death. Importantly, we demonstrate that the condensin IDR can impart cell cycle regulatory functions when transferred to other subunits within the complex, indicating its autonomous nature. Collectively, our study unveils the molecular basis for the initiation of chromosome condensation in early mitosis and how this process ultimately promotes genomic stability and faultless cell division.
Collapse
Affiliation(s)
- Alyssa Pastic
- Ottawa Institute of Systems Biology, Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Michael L Nosella
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Annahat Kochhar
- Ottawa Institute of Systems Biology, Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zi Hao Liu
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Julie D Forman-Kay
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Damien D'Amours
- Ottawa Institute of Systems Biology, Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada.
| |
Collapse
|
47
|
Lin L, Huang Y, McIntyre J, Chang CH, Colmenares S, Lee YCG. Prevalent fast evolution of genes involved in heterochromatin functions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.03.583199. [PMID: 38496614 PMCID: PMC10942301 DOI: 10.1101/2024.03.03.583199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Heterochromatin is a gene-poor and repeat-rich genomic compartment universally found in eukaryotes. Despite its low transcriptional activity, heterochromatin plays important roles in maintaining genome stability, organizing chromosomes, and suppressing transposable elements (TEs). Given the importance of these functions, it is expected that the genes involved in heterochromatin regulation would be highly conserved. Yet, a handful of these genes were found to evolve rapidly. To investigate whether these previous findings are anecdotal or general to genes modulating heterochromatin, we compile an exhaustive list of 106 candidate genes involved in heterochromatin functions and investigate their evolution over short and long evolutionary time scales in Drosophila. Our analyses find that these genes exhibit significantly more frequent evolutionary changes, both in the forms of amino acid substitutions and gene copy number change, when compared to genes involved in Polycomb-based repressive chromatin. While positive selection drives amino acid changes within both structured domains with diverse functions and intrinsically disordered regions (IDRs), purifying selection may have maintained the proportions of IDRs of these proteins. Together with the observed negative associations between evolutionary rates of these genes and genomic TE abundance, we propose an evolutionary model where the fast evolution of genes involved in heterochromatin functions is an inevitable outcome of the unique functional roles of heterochromatin, while the rapid evolution of TEs may be an effect rather than cause. Our study provides an important global view of the evolution of genes involved in this critical cellular domain and provides insights into the factors driving the distinctive evolution of heterochromatin.
Collapse
|
48
|
Wassmer E, Koppány G, Hermes M, Diederichs S, Caudron-Herger M. Refining the pool of RNA-binding domains advances the classification and prediction of RNA-binding proteins. Nucleic Acids Res 2024; 52:7504-7522. [PMID: 38917322 PMCID: PMC11260472 DOI: 10.1093/nar/gkae536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 05/31/2024] [Accepted: 06/13/2024] [Indexed: 06/27/2024] Open
Abstract
From transcription to decay, RNA-binding proteins (RBPs) influence RNA metabolism. Using the RBP2GO database that combines proteome-wide RBP screens from 13 species, we investigated the RNA-binding features of 176 896 proteins. By compiling published lists of RNA-binding domains (RBDs) and RNA-related protein family (Rfam) IDs with lists from the InterPro database, we analyzed the distribution of the RBDs and Rfam IDs in RBPs and non-RBPs to select RBDs and Rfam IDs that were enriched in RBPs. We also explored proteins for their content in intrinsically disordered regions (IDRs) and low complexity regions (LCRs). We found a strong positive correlation between IDRs and RBDs and a co-occurrence of specific LCRs. Our bioinformatic analysis indicated that RBDs/Rfam IDs were strong indicators of the RNA-binding potential of proteins and helped predicting new RBP candidates, especially in less investigated species. By further analyzing RBPs without RBD, we predicted new RBDs that were validated by RNA-bound peptides. Finally, we created the RBP2GO composite score by combining the RBP2GO score with new quality factors linked to RBDs and Rfam IDs. Based on the RBP2GO composite score, we compiled a list of 2018 high-confidence human RBPs. The knowledge collected here was integrated into the RBP2GO database at https://RBP2GO-2-Beta.dkfz.de.
Collapse
Affiliation(s)
- Elsa Wassmer
- Research Group “RNA-Protein Complexes & Cell Proliferation”, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Gergely Koppány
- Research Group “RNA-Protein Complexes & Cell Proliferation”, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Malte Hermes
- Research Group “RNA-Protein Complexes & Cell Proliferation”, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Sven Diederichs
- Division of Cancer Research, Department of Thoracic Surgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, and German Cancer Consortium (DKTK), partner site Freiburg, a partnership between DKFZ and University Medical Center Freiburg, 79106 Freiburg, Germany
| | - Maïwen Caudron-Herger
- Research Group “RNA-Protein Complexes & Cell Proliferation”, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
49
|
Bonin JP, Aramini JM, Dong Y, Wu H, Kay LE. AlphaFold2 as a replacement for solution NMR structure determination of small proteins: Not so fast! JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2024; 364:107725. [PMID: 38917639 DOI: 10.1016/j.jmr.2024.107725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 06/18/2024] [Accepted: 06/19/2024] [Indexed: 06/27/2024]
Abstract
The determination of a protein's structure is often a first step towards the development of a mechanistic understanding of its function. Considerable advances in computational protein structure prediction have been made in recent years, with AlphaFold2 (AF2) emerging as the primary tool used by researchers for this purpose. While AF2 generally predicts accurate structures of folded proteins, we present here a case where AF2 incorrectly predicts the structure of a small, folded and compact protein with high confidence. This protein, pro-interleukin-18 (pro-IL-18), is the precursor of the cytokine IL-18. Interestingly, the structure of pro-IL-18 predicted by AF2 matches that of the mature cytokine, and not the corresponding experimentally determined structure of the pro-form of the protein. Thus, while computational structure prediction holds immense promise for addressing problems in protein biophysics, there is still a need for experimental structure determination, even in the context of small well-folded, globular proteins.
Collapse
Affiliation(s)
- Jeffrey P Bonin
- Departments of Molecular Genetics and Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada; Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada
| | - James M Aramini
- Departments of Molecular Genetics and Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada; Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada
| | - Ying Dong
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA; Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Hao Wu
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA; Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Lewis E Kay
- Departments of Molecular Genetics and Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada; Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada.
| |
Collapse
|
50
|
Sanejouand YH. Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs? J Mol Evol 2024:10.1007/s00239-024-10174-z. [PMID: 38916610 DOI: 10.1007/s00239-024-10174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 06/26/2024]
Abstract
By looking for a lack of homologs in a reference database of 27 well-annotated proteomes of primates and 52 well-annotated proteomes of other mammals, 170 putative human-specific proteins were identified. While most of them are deemed uncertain, 2 are known at the protein level and 23 at the transcript level, according to UniProt. Interestingly, 23 of these 25 proteins are found to be encoded or to have close homologs in an open reading frame of a long noncoding human RNA. However, half of them are predicted to be at least 80% globular, with a single structural domain, according to IUPred, and with at least 80% of ordered residues, according to flDPnn. Strikingly, there is a near-complete lack of structural knowledge about these proteins, with no tertiary structure presently available in the Protein Data Bank and a fair prediction for one of them in the AlphaFold Protein Structure Database. Moreover, knowledge about the function of these possibly key proteins remains scarce.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, 2 rue de la Houssinière, Nantes, 44322, Pays de la Loire, France.
| |
Collapse
|