1
|
Ramon A, Ni M, Predeina O, Gaffey R, Kunz P, Onuoha S, Sormanni P. Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt. MAbs 2025; 17:2442750. [PMID: 39772905 PMCID: PMC11730357 DOI: 10.1080/19420862.2024.2442750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 12/10/2024] [Accepted: 12/11/2024] [Indexed: 01/11/2025] Open
Abstract
In-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use of scarce and disparate data can introduce biases during evaluation, leading to unreliable model performances being reported. Here, we present a comprehensive study exploring various approaches for protein fitness prediction from limited data, leveraging pre-trained embeddings, repeated stratified nested cross-validation, and ensemble learning to ensure an unbiased assessment of the performances. We applied our framework to introduce NanoMelt, a predictor of nanobody thermostability trained with a dataset of 640 measurements of apparent melting temperature, obtained by integrating data from the literature with 129 new measurements from this study. We find that an ensemble model stacking multiple regression using diverse sequence embeddings achieves state-of-the-art accuracy in predicting nanobody thermostability. We further demonstrate NanoMelt's potential to streamline nanobody development by guiding the selection of highly stable nanobodies. We make the curated dataset of nanobody thermostability freely available and NanoMelt accessible as a downloadable software and webserver.
Collapse
Affiliation(s)
- Aubin Ramon
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Mingyang Ni
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Olga Predeina
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Rebecca Gaffey
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Patrick Kunz
- Division of Functional Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Pietro Sormanni
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
2
|
Maier A, Cha M, Burgess S, Wang A, Cuellar C, Kim S, Rajan NS, Neyyan J, Sengupta R, O’Connor K, Ott N, Williams A. Predicting purification process fit of monoclonal antibodies using machine learning. MAbs 2025; 17:2439988. [PMID: 39782766 PMCID: PMC11730362 DOI: 10.1080/19420862.2024.2439988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 01/12/2025] Open
Abstract
In early-stage development of therapeutic monoclonal antibodies, assessment of the viability and ease of their purification typically requires extensive experimentation. However, the work required for upstream protein expression and downstream purification development often conflicts with timeline pressures and material constraints, limiting the number of molecules and process conditions that can reasonably be assessed. Recently, high-throughput batch-binding screen data along with improved molecular descriptors have enabled development of robust quantitative structure-property relationship (QSPR) models that predict monoclonal antibody chromatographic binding behavior from the amino acid sequence. Here, we describe a QSPR strategy for in silico monoclonal antibody purification process fit assessment. Principal Component Analysis is applied to extract a one-dimensional basis for comparison of molecular chromatographic binding behavior from multi-dimensional high-throughput batch-binding screen data. Kernel Ridge Regression is used to predict the first principal component for new molecular sequences. This workflow is demonstrated with a set of 97 monoclonal antibodies for five chromatography resins in two salt types across a range of pH and salt concentrations. Model development benchmarks four descriptor sets from biophysical structural models and protein language models. The investigation illustrates the value QSPR models can provide to purification process fit assessment, and selection of resins and operating conditions from sequence alone.
Collapse
Affiliation(s)
- Andrew Maier
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Minjeong Cha
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Sean Burgess
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Amy Wang
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Carlos Cuellar
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Soo Kim
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Neeraja Sundar Rajan
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Josephine Neyyan
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Rituparna Sengupta
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Kelly O’Connor
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Nicole Ott
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| | - Ambrose Williams
- Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA
| |
Collapse
|
3
|
Fatima Ali N, Khan S, Zahid S. A critical address to advancements and challenges in computational strategies for structural prediction of protein in recent past. Comput Biol Chem 2025; 117:108430. [PMID: 40121710 DOI: 10.1016/j.compbiolchem.2025.108430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/11/2025] [Accepted: 03/12/2025] [Indexed: 03/25/2025]
Abstract
Protein structure prediction has undergone significant advancements, driven by the limitations of experimental techniques like X-ray crystallography, NMR, and cryo-EM, which are costly and time-consuming. To bridge the gap between protein sequences and their structures, computational methods have emerged as essential tools. Traditional approaches such as homology modeling, threading, and ab initio folding made progress but often lacked atomic-level precision. The field has been revolutionized by deep learning-based models such as AlphaFold2, RoseTTAFold, and OpenFold, which have demonstrated unprecedented accuracy in predicting protein structures. These AI-driven models leverage vast datasets and neural networks to generate highly reliable structural predictions, sometimes rivaling experimental methods. This review explores the historical evolution of computational protein structure prediction, analyzing the strengths and weaknesses of state-of-the-art models. These models have broad applications in fields such as drug discovery, enzyme engineering, and disease-related protein modeling. However, challenges remain, including the need for extensive training data, computational resource requirements, and difficulties in modeling protein dynamics, intrinsically disordered regions, and protein-protein interactions. Future directions in the field include improving AI models to address current limitations, better integration with experimental techniques, and extending predictions to protein complexes and post-translational modifications. By continuing to refine these methods, computational protein structure prediction will further enhance biomedical research and therapeutic design, reshaping the landscape of structural biology and computational biophysics.
Collapse
Affiliation(s)
- Nida Fatima Ali
- Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Shumaila Khan
- Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Saadia Zahid
- Neurobiology Research Laboratory, Department of Biomedicine, Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology, Islamabad, Pakistan.
| |
Collapse
|
4
|
Liu Z, Qiu WR, Liu Y, Yan H, Pei W, Zhu YH, Qiu J. A comprehensive review of computational methods for Protein-DNA binding site prediction. Anal Biochem 2025; 703:115862. [PMID: 40209920 DOI: 10.1016/j.ab.2025.115862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/20/2025] [Accepted: 04/06/2025] [Indexed: 04/12/2025]
Abstract
Accurately identifying protein-DNA binding sites is essential for understanding the molecular mechanisms underlying biological processes, which in turn facilitates advancements in drug discovery and design. While biochemical experiments provide the most accurate way to locate DNA-binding sites, they are generally time-consuming, resource-intensive, and expensive. There is a pressing need to develop computational methods that are both efficient and accurate for DNA-binding site prediction. This study thoroughly reviews and categorizes major computational approaches for predicting DNA-binding sites, including template detection, statistical machine learning, and deep learning-based methods. The 14 state-of-the-art DNA-binding site prediction models have been benchmarked on 136 non-redundant proteins, where the deep learning-based, especially pre-trained large language model-based, methods achieve superior performance over the other two categories. Applications of these DNA-binding site prediction methods are also involved.
Collapse
Affiliation(s)
- Zi Liu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Wang-Ren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, 196 Huayang West Road, Yangzhou, 225100, China
| | - He Yan
- College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, 159 Longpanlu Road, Nanjing, 210037, China
| | - Wenyi Pei
- Geriatric Department, Shanghai Baoshan District Wusong Central Hospital, 101 Tongtai North Road, Shanghai, 200940, China.
| | - Yi-Heng Zhu
- College of Artificial Intelligence, Nanjing Agricultural University, 1 Weigang Road, Nanjing, 210095, China.
| | - Jing Qiu
- Information Department, The First Affiliated Hospital of Naval Medical University, 168 Changhai Road, Shanghai, 200433, China.
| |
Collapse
|
5
|
Crauwels C, Díaz A, Vranken W. GPCRchimeraDB: A Database of Chimeric G Protein-Coupled Receptors (GPCRs) to Assist Their Design. J Mol Biol 2025; 437:169164. [PMID: 40268234 DOI: 10.1016/j.jmb.2025.169164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 04/11/2025] [Accepted: 04/16/2025] [Indexed: 04/25/2025]
Abstract
G protein-coupled receptors (GPCRs) are membrane proteins crucial to numerous diseases, yet many remain poorly characterized and untargeted by drugs. Chimeric GPCRs have emerged as valuable tools for elucidating GPCR function by facilitating the identification of signaling pathways, resolving structures, and discovering novel ligands of poorly understood GPCRs. Such chimeric GPCRs are obtained by merging a well- and less-well-characterized GPCR at the intracellular limits of their transmembrane regions or intracellular loops, leveraging knowledge transfer from the well-characterized GPCR. However, despite the engineering of over 200 chimeric GPCRs to date, the design process remains largely trial-and-error and lacks a standardized approach. To address this gap, we introduce GPCRchimeraDB (https://www.bio2byte.be/gpcrchimeradb/), the first comprehensive database dedicated to chimeric GPCRs. It catalogs 212 chimeric receptors, identified through literature review, and includes 1,755 class A natural GPCRs, enabling connections between chimeras and their parent receptors while facilitating the exploration of novel parent combinations. Both chimeric and natural GPCR entries are extensively described at the sequence, structural, and biophysical level through a range of visualization tools, with annotations from resources like UniProt and GPCRdb and predictions from AlphaFold2 and b2btools. Additionally, GPCRchimeraDB offers a GPCR sequence aligner and a feature comparator to investigate differences between natural and chimeric receptors. It also provides design guidelines to support rational chimera engineering. GPCRchimeraDB is therefore a resource to facilitate and optimize the design of new chimeras, so helping to gain insights into poorly characterized receptors and contributing to advances in GPCR therapeutic development.
Collapse
Affiliation(s)
- Charlotte Crauwels
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium; Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium; AI Lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Adrián Díaz
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium; Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium; AI Lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium; Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium; AI Lab, Vrije Universiteit Brussel, Brussels, Belgium; Chemistry Department, Vrije Universiteit Brussel, Brussels, Belgium; Biomedical Sciences, Vrije Universiteit Brussel, Brussels, Belgium.
| |
Collapse
|
6
|
Zheng Y, Young ND, Wang T, Chang BCH, Song J, Gasser RB. Systems biology of Haemonchus contortus - Advancing biotechnology for parasitic nematode control. Biotechnol Adv 2025; 81:108567. [PMID: 40127743 DOI: 10.1016/j.biotechadv.2025.108567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 03/19/2025] [Accepted: 03/21/2025] [Indexed: 03/26/2025]
Abstract
Parasitic nematodes represent a substantial global burden, impacting animal health, agriculture and economies worldwide. Of these worms, Haemonchus contortus - a blood-feeding nematode of ruminants - is a major pathogen and a model for molecular and applied parasitology research. This review synthesises some key advances in understanding the molecular biology, genetic diversity and host-parasite interactions of H. contortus, highlighting its value for comparative studies with the free-living nematode Caenorhabditis elegans. Key themes include recent developments in genomic, transcriptomic and proteomic technologies and resources, which are illuminating critical molecular pathways, including the ubiquitination pathway, protease/protease inhibitor systems and the secretome of H. contortus. Some of these insights are providing a foundation for identifying essential genes and exploring their potential as targets for novel anthelmintics or vaccines, particularly in the face of widespread anthelmintic resistance. Advanced bioinformatic tools, such as machine learning (ML) algorithms and artificial intelligence (AI)-driven protein structure prediction, are enhancing annotation capabilities, facilitating and accelerating analyses of gene functions, and biological pathways and processes. This review also discusses the integration of these tools with cutting-edge single-cell sequencing and spatial transcriptomics to dissect host-parasite interactions at the cellular level. The discussion emphasises the importance of curated databases, improved culture systems and functional genomics platforms to translate molecular discoveries into practical outcomes, such as novel interventions. New research findings and resources not only advance research on H. contortus and related nematodes but may also pave the way for innovative solutions to the global challenges with anthelmintic resistance.
Collapse
Affiliation(s)
- Yuanting Zheng
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Tao Wang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Bill C H Chang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Jiangning Song
- Faculty of IT, Department of Data Science and AI, Monash University, Victoria, Australia; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia; Monash Data Futures Institute, Monash University, Victoria, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
7
|
Gao S, Jia Y, Cui F, Xu J, Meng Y, Wei L, Zhang Q, Zou Q, Zhang Z. PLPTP: A Motif-based Interpretable Deep Learning Framework Based on Protein Language Models for Peptide Toxicity Prediction. J Mol Biol 2025; 437:169115. [PMID: 40158838 DOI: 10.1016/j.jmb.2025.169115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Revised: 03/01/2025] [Accepted: 03/25/2025] [Indexed: 04/02/2025]
Abstract
Peptide toxicity prediction holds significant importance in drug development and biotechnology, as accurately identifying toxic peptide sequences is crucial for designing safer peptide-based drugs. This study proposes a deep learning-based model for peptide toxicity prediction, integrating Evolutionary Scale Modeling (ESM2), Bidirectional Long Short-Term Memory (BiLSTM), and Deep Neural Network (DNN). The ESM2 model captures evolutionary information from peptide sequences, providing a rich context for the sequences; the BiLSTM network focuses on extracting contextual dependencies, thereby capturing long-range dependencies within the sequence; and the DNN further classifies the extracted features to achieve the final toxicity prediction. To enhance the reliability and transparency of the model, we also conducted motif analysis to identify key patterns in the data, which helps to explain the model's attention mechanism and its classification performance. To address the class imbalance in the dataset, we employed Focal Loss as the loss function, which enhances the model's ability to identify minority class samples by reducing the contribution of easily classified samples. Experimental results demonstrate that the proposed model performs exceptionally well across multiple evaluation metrics, particularly in handling imbalanced data, achieving significant improvements over traditional methods. This result highlights the model's potential to improve the accuracy of peptide toxicity prediction and its valuable role in drug development and biotechnology research. The PLPTP web server is available at https://www.bioai-lab.com/PLPTP.
Collapse
Affiliation(s)
- Shun Gao
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yanna Jia
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Junlin Xu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081 Hubei, China
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200 Hubei, China
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao Special Administrative Region of China; School of Informatics, Xiamen University, Xiamen, China
| | - Qingchen Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
8
|
Tariq A, Shoaib M, Qu L, Shoukat S, Nan X, Song J. Exploring 4 th generation EGFR inhibitors: A review of clinical outcomes and structural binding insights. Eur J Pharmacol 2025; 997:177608. [PMID: 40216184 DOI: 10.1016/j.ejphar.2025.177608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 03/24/2025] [Accepted: 04/07/2025] [Indexed: 04/18/2025]
Abstract
Epidermal growth factor receptor (EGFR) is a potential target for anticancer therapies and plays a crucial role in cell growth, survival, and metastasis. EGFR gene mutations trigger aberrant signaling, leading to non-small cell lung cancer (NSCLC). Tyrosine kinase inhibitors (TKIs) effectively target these mutations to treat NSCLC. While the first three generations of EGFR TKIs have been proven effective, the emergence of the EGFR-C797S resistance mutation poses a new challenge. To address this, various synthetic EGFR TKIs have been developed. In this review, we have summarized the EGFR TKIs reported in the past five years, focusing on their clinical outcomes and structure-activity relationship analysis. We have also explored binding modes and interactions between the binding pocket and ligands to provide insights into the mechanisms of these inhibitors, which contribute to advancements in targeted cancer therapy. Additionally, artificial Intelligence-driven methods, including recursive neural networks and reinforcement learning, have revolutionized EGFR inhibitor design by facilitating rapid screening, predicting EGFR mutations, and novel compound generation.
Collapse
Affiliation(s)
- Amina Tariq
- College of Chemistry, Pingyuan Laboratory, and State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Zhengzhou University, Zhengzhou, Henan, 450001, China
| | - Muhammad Shoaib
- College of Chemistry, Pingyuan Laboratory, and State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Zhengzhou University, Zhengzhou, Henan, 450001, China
| | - Lingbo Qu
- College of Chemistry, Pingyuan Laboratory, and State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Zhengzhou University, Zhengzhou, Henan, 450001, China; Institute of Chemistry, Henan Academy of Science, Zhengzhou, Henan, 450046, China
| | - Sana Shoukat
- Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials (Ministry of Education), Shandong University, Jinan, 250061, China
| | - Xiaofei Nan
- School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, 450001, China.
| | - Jinshuai Song
- College of Chemistry, Pingyuan Laboratory, and State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Zhengzhou University, Zhengzhou, Henan, 450001, China.
| |
Collapse
|
9
|
Zhang L, Xiong S, Xu L, Liang J, Zhao X, Zhang H, Tan X. Leveraging protein language models for robust antimicrobial peptide detection. Methods 2025; 238:19-26. [PMID: 40049432 DOI: 10.1016/j.ymeth.2025.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 02/09/2025] [Accepted: 03/03/2025] [Indexed: 03/15/2025] Open
Abstract
Antimicrobial peptides (AMPs) are promising candidates for addressing the global challenge of antibiotic resistance due to their broad-spectrum antimicrobial properties. Traditional AMP identification methods, while effective, are labor-intensive and time-consuming. Recent advancements in deep learning and large language models (LLMs), especially protein language models (PLMs) present a transformative approach for AMP prediction. In this study, we propose PLAPD, a novel framework leveraging a pre-trained ESM2 protein language model for AMP classification. Besides, PLAPD combines local feature extraction via convolutional layers and global feature extraction with a residual Transformer module. We benchmarked PLAPD against state-of-the-art AMP prediction models using a dataset comprising 8,268 peptide sequences, achieving superior performance in Accuracy (0.87), Precision (0.9359), Specificity (0.9456), MCC (0.7486), and AUC (0.9225). The results highlight the potential of PLAPD as a high-throughput and accurate tool for AMP discovery.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China.
| | - Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China
| | - Junwei Liang
- School of Computer and Software, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Xuehua Zhao
- School of Digital Media, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Honglai Zhang
- Thyroid Surgery Department, The Affiliated Hospital of Qingdao University, Qingdao 266035, China
| | - Xu Tan
- School of Artificial Intelligence, Shenzhen Institute of Information Technology, Shenzhen 518172, China.
| |
Collapse
|
10
|
Han Y, Zhang SW, Shi MH, Zhang QQ, Li Y, Cui X. Predicting protein-protein interaction with interpretable bilinear attention network. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108756. [PMID: 40174317 DOI: 10.1016/j.cmpb.2025.108756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 03/27/2025] [Accepted: 03/27/2025] [Indexed: 04/04/2025]
Abstract
BACKGROUND AND OBJECTIVE Protein-protein interactions (PPIs) play the key roles in myriad biological processes, helping to understand the protein function and disease pathology. Identification of PPIs and their interaction types through wet experimental methods are costly and time-consuming. Therefore, some computational methods (e.g., sequence-based deep learning method) have been proposed to predict PPIs. However, these methods predominantly focus on protein sequence information, neglecting the protein structure information, while the protein structure is closely related to its function. In addition, current PPI prediction methods that introduce the protein structure information use independent encoders to learn the sequence and structure representations from protein sequences and structures, respectively, without explicitly learn the important local interaction representation of two proteins, making the prediction results hard to interpret. METHODS Considering that current protein structure prediction methods (e.g., AlphaFold2) can accurately predict protein 3D structures and also provide a large number of protein 3D structures, here we present a novel end-to-end framework (called PPI-BAN) to predict PPIs and their interaction types by integrating protein sequence information and 3D structure information. PPI-BAN uses one-dimensional convolution operation (Conv1D) to extract the protein sequence features, employes GeomEtry-Aware Relational Graph Neural Network (GearNet) to learn protein 3D structure features, and adopts a deep bilinear attention network (BAN) to learn the joint features between one protein sequence and its 3D structure. The sequence features, structure features and joint features are concatenated to fed into a fully connected network for predicting PPIs and their interaction types. RESULTS Experimental results show that PPI-BAN achieves the best overall performance against other state-of-the-art methods. CONCLUSIONS PPI-BAN can effectively predict PPIs and their interaction types, and identify the significant interaction sites by computing attention weight maps and mapping them to specific amino acid residues.
Collapse
Affiliation(s)
- Yong Han
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China; Henan Judicial Police Vocational College, Zhengzhou, 450046, China
| | - Shao-Wu Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Ming-Hui Shi
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Qing-Qing Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yi Li
- Henan Judicial Police Vocational College, Zhengzhou, 450046, China
| | - Xiaodong Cui
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
11
|
Berndsen CE, Storm AR, Sardelli AM, Hossain SR, Clermont KR, McFather LM, Connor MA, Monroe JD. The Pseudoenzyme β-Amylase9 From Arabidopsis Activates α-Amylase3: A Possible Mechanism to Promote Stress-Induced Starch Degradation. Proteins 2025; 93:1189-1201. [PMID: 39846389 PMCID: PMC12046210 DOI: 10.1002/prot.26803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 01/09/2025] [Accepted: 01/12/2025] [Indexed: 01/24/2025]
Abstract
Starch accumulation in plants provides carbon for nighttime use, for regrowth after periods of dormancy, and for times of stress. Both ɑ- and β-amylases (AMYs and BAMs, respectively) catalyze starch hydrolysis, but their functional roles are unclear. Moreover, the presence of catalytically inactive amylases that show starch excess phenotypes when deleted presents questions on how starch degradation is regulated. Plants lacking one of these catalytically inactive β-amylases, BAM9, have enhanced starch accumulation when combined with mutations in BAM1 and BAM3, the primary starch degrading BAMs in response to stress and at night, respectively. BAM9 has been reported to be transcriptionally induced by stress although the mechanism for BAM9 function is unclear. From yeast two-hybrid experiments, we identified the plastid-localized AMY3 as a potential interaction partner for BAM9. We found that BAM9 interacted with AMY3 in vitro and that BAM9 enhances AMY3 activity about three-fold. Modeling of the AMY3-BAM9 complex predicted a previously undescribed alpha-alpha hairpin in AMY3 that could serve as a potential interaction site. Additionally, AMY3 lacking the alpha-alpha hairpin is unaffected by BAM9. Structural analysis of AMY3 showed that it can form a homodimer in solution and that BAM9 appears to replace one of the AMY3 monomers to form a heterodimer. The presence of both BAM9 and AMY3 in many vascular plant lineages, along with model-based evidence that they heterodimerize, suggests that the interaction is conserved. Collectively these data suggest that BAM9 is a pseudoamylase that activates AMY3 in response to cellular stress, possibly facilitating stress recovery.
Collapse
Affiliation(s)
| | - Amanda R. Storm
- Department of BiologyWestern Carolina UniversityCullowheeNorth CarolinaUSA
- Department of BiologyJames Madison UniversityHarrisonburgVirginiaUSA
| | - Angelina M. Sardelli
- Department of Chemistry and BiochemistryJames Madison UniversityHarrisonburgVirginiaUSA
| | - Sheikh R. Hossain
- Department of BiologyJames Madison UniversityHarrisonburgVirginiaUSA
| | | | - Luke M. McFather
- Department of Chemistry and BiochemistryJames Madison UniversityHarrisonburgVirginiaUSA
| | - Mafe A. Connor
- Department of Chemistry and BiochemistryJames Madison UniversityHarrisonburgVirginiaUSA
| | - Jonathan D. Monroe
- Department of Chemistry and BiochemistryJames Madison UniversityHarrisonburgVirginiaUSA
- Department of BiologyJames Madison UniversityHarrisonburgVirginiaUSA
| |
Collapse
|
12
|
Zhang R, Li Y, Ali U, Li Y, Zhang H. Unveiling novel antioxidant peptides from silk fibroin proteins: An integrated in silico and in vitro study. Food Chem 2025; 476:143292. [PMID: 39977993 DOI: 10.1016/j.foodchem.2025.143292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 02/03/2025] [Accepted: 02/07/2025] [Indexed: 02/22/2025]
Abstract
Antioxidant peptides exhibit significant potential in combating degenerative diseases by effectively mitigating oxidative stress. In this study, we developed a machine-learning model for screening antioxidant peptides, achieving a Matthews correlation coefficient of 0.892 ± 0.033 and surpassing the state-of-the-art (SOTA) models. Through in silico screening, seven novel antioxidant peptides derived from silk fibroin proteins (SFP) were identified (i.e., DEDY, NEEY, GAGRGY, ITRNHDQCR, VDHNL, QGDY, and DDY) and subsequently synthesized. Among them, all except for GAGRGY and QGDY demonstrated notable antioxidant activity in ABTS free radical assays, which were 1.26-3.25 times higher than that of glutathione. All seven antioxidant peptides effectively protected erythrocytes from oxidative damage. This protective capacity is likely attributed to their ability to bind free radicals and regulate the Keap1-Nrf2 pathway. Overall, this study presents an effective strategy for discovering antioxidant peptides from SFP and provides strong experimental validation for testing the effectiveness of the machine learning model.
Collapse
Affiliation(s)
- Ruihao Zhang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China; Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, PR China
| | - Yonghui Li
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
| | - Usman Ali
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China
| | - Yang Li
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China
| | - Hui Zhang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China.
| |
Collapse
|
13
|
Djulbegovic MB, Gonzalez DJT, Laratelli L, Antonietti M, Uversky VN, Shields CL, Karp CL. A Computational Approach to Characterize the Protein S-Mer Tyrosine Kinase (PROS1-MERTK) Protein-Protein Interaction Dynamics. Cell Biochem Biophys 2025; 83:1743-1755. [PMID: 39535659 PMCID: PMC12089150 DOI: 10.1007/s12013-024-01582-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2024] [Indexed: 11/16/2024]
Abstract
Protein S (PROS1) has recently been identified as a ligand for the TAM receptor MERTK, influencing immune response and cell survival. The PROS1-MERTK interaction plays a role in cancer progression, promoting immune evasion and metastasis in multiple cancers by fostering a tumor-supportive microenvironment. Despite its importance, limited structural insights into this interaction underscore the need for computational studies to explore their binding dynamics, potentially guiding targeted therapies. In this study, we investigated the PROS1-MERTK interaction using advanced computational analyses to support immunotherapy research. High-resolution structural models from ColabFold, an AlphaFold2 adaptation, provided a baseline structure, allowing us to examine the PROS1-MERTK interface with ChimeraX and map residue interactions through Van der Waals criteria. Molecular dynamics (MD) simulations were conducted in GROMACS over 100 ns to assess stability and conformational changes using RMSD, RMSF, and radius of gyration (Rg). The PROS1-MERTK interface was predicted to contain a heterogeneous mix of amino acid contacts, with lysine and leucine as frequent participants. MD simulations demonstrated prominent early structural shifts, stabilizing after approximately 50 ns with small conformational shifts occurring as the simulation completed. In addition, there are various regions in each protein that are predicted to have greater conformational fluctuations as compared to others, which may represent attractive areas to target to halt the progression of the interaction. These insights deepen our understanding of the PROS1-MERTK interaction role in immune modulation and tumor progression, unveiling potential targets for cancer immunotherapy.
Collapse
Affiliation(s)
- Mak B Djulbegovic
- Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | | | | | | | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
| | - Carol L Shields
- Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Carol L Karp
- Bascom Palmer Eye Institute, University of Miami, Miami, FL, USA.
| |
Collapse
|
14
|
Dennler O, Ryan CJ. Evaluating sequence and structural similarity metrics for predicting shared paralog functions. NAR Genom Bioinform 2025; 7:lqaf051. [PMID: 40290317 PMCID: PMC12034104 DOI: 10.1093/nargab/lqaf051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 03/07/2025] [Accepted: 04/15/2025] [Indexed: 04/30/2025] Open
Abstract
Gene duplication is the primary source of new genes, resulting in most genes having identifiable paralogs. Over time, paralog pairs may diverge in some respects but many retain the ability to perform the same functional role. Protein sequence identity is often used as a proxy for functional similarity and can predict shared functions between paralogs as revealed by synthetic lethal experiments. However, the advent of alternative protein representations, including embeddings from protein language models (PLMs) and predicted structures from AlphaFold, raises the possibility that alternative similarity metrics could better capture functional similarity between paralogs. Here, using two species (budding yeast and human) and two different definitions of shared functionality (shared protein-protein interactions and synthetic lethality), we evaluated a variety of alternative similarity metrics. For some tasks, predicted structural similarity or PLM similarity outperform sequence identity, but more importantly these similarity metrics are not redundant with sequence identity, i.e. combining them with sequence identity leads to improved predictions of shared functionality. By adding contextual features, representing similarity to homologous proteins within and across species, we can significantly enhance our predictions of shared paralog functionality. Overall, our results suggest that alternative similarity metrics capture complementary aspects of functional similarity beyond sequence identity alone.
Collapse
Affiliation(s)
- Olivier Dennler
- School of Medicine, University College Dublin, Dublin 4, D04 V1W8, Ireland
- School of Computer Science, University College Dublin, Dublin 4, D04 V1W8, Ireland
- Conway Institute, University College Dublin, Dublin 4, D04 V1W8, Ireland
| | - Colm J Ryan
- School of Medicine, University College Dublin, Dublin 4, D04 V1W8, Ireland
- School of Computer Science, University College Dublin, Dublin 4, D04 V1W8, Ireland
- Conway Institute, University College Dublin, Dublin 4, D04 V1W8, Ireland
| |
Collapse
|
15
|
Osipov SD, Zinovev EV, Anuchina AA, Kuzmin AS, Minaeva AV, Ryzhykau YL, Vlasov AV, Gushchin IY. High-Throughput Evaluation of Natural Diversity of F-Type ATP Synthase Rotor Ring Stoichiometries. Proteins 2025; 93:1128-1140. [PMID: 39810702 DOI: 10.1002/prot.26790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 01/16/2025]
Abstract
Adenosine triphosphate (ATP) synthases are large enzymes present in every living cell. They consist of a transmembrane and a soluble domain, each comprising multiple subunits. The transmembrane part contains an oligomeric rotor ring (c-ring), whose stoichiometry defines the ratio between the number of synthesized ATP molecules and the number of ions transported through the membrane. Currently, c-rings of F-Type ATP synthases consisting of 8-17 (except 16) subunits have been experimentally demonstrated, but it is not known whether other stoichiometries are present in natural organisms. Here, we present an easy-to-use high-throughput computational approach based on AlphaFold that allows us to estimate the stoichiometry of all homo-oligomeric c-rings, whose sequences are present in genomic databases. We validate the approach on the available experimental data, obtaining the correlation as high as 0.94 for the reference dataset and use it to predict the existence of c-rings with stoichiometry varying at least from 8 to 27. We then conduct molecular dynamics simulations of two c-rings with stoichiometry above 17 to corroborate the machine learning-based predictions. Our work strongly suggests existence of rotor rings with previously undescribed high stoichiometry in natural organisms and highlights the utility of AlphaFold-based approaches for studying homo-oligomeric proteins.
Collapse
Affiliation(s)
- Stepan D Osipov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Egor V Zinovev
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Arina A Anuchina
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Alexander S Kuzmin
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Andronika V Minaeva
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Yury L Ryzhykau
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Frank Laboratory for Neutron Physics, Joint Institute for Nuclear Research, Dubna, Russia
| | - Alexey V Vlasov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Frank Laboratory for Neutron Physics, Joint Institute for Nuclear Research, Dubna, Russia
| | - Ivan Yu Gushchin
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| |
Collapse
|
16
|
Khatri M, Shanmugam NRS, Zhang X, Patel RSKR, Yin Y. AcrDB update: Predicted 3D structures of anti-CRISPRs in human gut viromes. Protein Sci 2025; 34:e70177. [PMID: 40400348 PMCID: PMC12095918 DOI: 10.1002/pro.70177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 05/07/2025] [Accepted: 05/09/2025] [Indexed: 05/23/2025]
Abstract
Anti-CRISPR (Acr) proteins play a key role in phage-host interactions and hold great promise for advancing genome-editing technologies. However, finding new Acrs has been challenging due to their low sequence similarity. Recent advances in protein structure prediction have opened new pathways for Acr discovery by using 3D structure similarity. This study presents an updated AcrDB, with the following new features not available in other databases: (1) predicted Acrs from human gut virome databases, (2) Acr structures predicted by AlphaFold2, (3) a structural similarity search function to allow users to submit new sequences and structures to search against 3D structures of experimentally known Acrs. The updated AcrDB contains predicted 3D structures of 795 candidate Acrs with structural similarity (TM-score ≥0.7) to known Acrs supported by at least two of the three non-sequence similarity-based tools (TM-Vec, Foldseek, AcrPred). Among these candidate Acrs, 121 are supported by all three tools. AcrDB also includes 3D structures of 122 experimentally characterized Acr proteins. The 121 most confident candidate Acrs were combined with the 122 known Acrs and clustered into 163 sequence similarity-based Acr families. The 163 families were further subject to a structure similarity-based hierarchical clustering, revealing structural similarity between 44 candidate Acr (cAcr) families and 119 known Acr families. The bacterial hosts of these 163 Acr families are mainly from Bacillota, Pseudomonadota, and Bacteroidota, which are all dominant gut bacterial phyla. Many of these 163 Acr families are also co-localized in Acr operons. All the data and visualization are provided on our website: https://pro.unl.edu/AcrDB.
Collapse
Affiliation(s)
- Minal Khatri
- Nebraska Food for Health Center, Department of Food Science and TechnologyUniversity of Nebraska—LincolnLincolnNebraskaUSA
| | - N. R. Siva Shanmugam
- Nebraska Food for Health Center, Department of Food Science and TechnologyUniversity of Nebraska—LincolnLincolnNebraskaUSA
| | - Xinpeng Zhang
- Nebraska Food for Health Center, Department of Food Science and TechnologyUniversity of Nebraska—LincolnLincolnNebraskaUSA
| | - Revanth Sai Kumar Reddy Patel
- Nebraska Food for Health Center, Department of Food Science and TechnologyUniversity of Nebraska—LincolnLincolnNebraskaUSA
| | - Yanbin Yin
- Nebraska Food for Health Center, Department of Food Science and TechnologyUniversity of Nebraska—LincolnLincolnNebraskaUSA
| |
Collapse
|
17
|
Ninot-Pedrosa M, Pálfy G, Razmazma H, Crowley J, Fogeron ML, Bersch B, Barnes A, Brutscher B, Monticelli L, Böckmann A, Meier BH, Lecoq L. NMR Structural Characterization of SARS-CoV-2 ORF6 Reveals an N-Terminal Membrane Anchor. J Am Chem Soc 2025; 147:17668-17681. [PMID: 40372136 DOI: 10.1021/jacs.4c17030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2025]
Abstract
SARS-CoV-2, the virus responsible for the COVID-19 pandemic, encodes several accessory proteins, among which ORF6, a potent interferon inhibitor, is recognized as one of the most cytotoxic. Here, we investigated the structure, oligomeric state, and membrane interactions of ORF6 using NMR spectroscopy and molecular dynamics simulations. Using chemical-shift-ROSETTA, we show that ORF6 in proteoliposomes adopts a straight α-helical structure with an extended, rigid N-terminal part and flexible C-terminal residues. Cross-linking experiments indicate that ORF6 forms oligomers within lipid bilayers, and paramagnetic spin labeling suggests an antiparallel arrangement in its multimers. The amphipathic ORF6 helix establishes multiple contacts with the membrane surface with its N-terminal residues acting as membrane anchors. Our work demonstrates that ORF6 is an integral monotopic membrane protein and provides key insights into its conformation and the importance of the N-terminal region for the interaction with the membrane.
Collapse
Affiliation(s)
- Martí Ninot-Pedrosa
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| | - Gyula Pálfy
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich 8093, Switzerland
| | - Hafez Razmazma
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| | - Jackson Crowley
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| | - Marie-Laure Fogeron
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| | - Beate Bersch
- Université Grenoble Alpes, CEA, CNRS, Institut de Biologie Structurale (IBS), Grenoble, Cedex 9 38044, France
| | - Alexander Barnes
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich 8093, Switzerland
| | - Bernhard Brutscher
- Université Grenoble Alpes, CEA, CNRS, Institut de Biologie Structurale (IBS), Grenoble, Cedex 9 38044, France
| | - Luca Monticelli
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| | - Anja Böckmann
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| | - Beat H Meier
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich 8093, Switzerland
| | - Lauriane Lecoq
- Molecular Microbiology and Structural Biochemistry (MMSB), UMR 5086 CNRS, Lyon 69367, France
| |
Collapse
|
18
|
Sun Q, Wang H, Xie J, Wang L, Mu J, Li J, Ren Y, Lai L. Computer-Aided Drug Discovery for Undruggable Targets. Chem Rev 2025. [PMID: 40423592 DOI: 10.1021/acs.chemrev.4c00969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2025]
Abstract
Undruggable targets are those of therapeutical significance but challenging for conventional drug design approaches. Such targets often exhibit unique features, including highly dynamic structures, a lack of well-defined ligand-binding pockets, the presence of highly conserved active sites, and functional modulation by protein-protein interactions. Recent advances in computational simulations and artificial intelligence have revolutionized the drug design landscape, giving rise to innovative strategies for overcoming these obstacles. In this review, we highlight the latest progress in computational approaches for drug design against undruggable targets, present several successful case studies, and discuss remaining challenges and future directions. Special emphasis is placed on four primary target categories: intrinsically disordered proteins, protein allosteric regulation, protein-protein interactions, and protein degradation, along with discussion of emerging target types. We also examine how AI-driven methodologies have transformed the field, from applications in protein-ligand complex structure prediction and virtual screening to de novo ligand generation for undruggable targets. Integration of computational methods with experimental techniques is expected to bring further breakthroughs to overcome the hurdles of undruggable targets. As the field continues to evolve, these advancements hold great promise to expand the druggable space, offering new therapeutic opportunities for previously untreatable diseases.
Collapse
Affiliation(s)
- Qi Sun
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
| | - Hanping Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Liying Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Junxi Mu
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yuhao Ren
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
19
|
Zhang C, He Y, Wang J, Chen T, Baltar F, Hu M, Liao J, Xiao X, Li ZR, Dong X. LucaPCycle: Illuminating microbial phosphorus cycling in deep-sea cold seep sediments using protein language models. Nat Commun 2025; 16:4862. [PMID: 40419512 DOI: 10.1038/s41467-025-60142-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Accepted: 05/16/2025] [Indexed: 05/28/2025] Open
Abstract
Phosphorus is essential for life and critically influences marine productivity. Despite geochemical evidence of active phosphorus cycling in deep-sea cold seeps, the microbial processes involved remain poorly understood. Traditional sequence-based searches often fail to detect proteins with remote homology. To address this, we developed a deep learning model, LucaPCycle, integrating raw sequences and contextual embeddings based on the protein language model ESM2-3B. LucaPCycle identified 5241 phosphorus-cycling protein families from global cold seep gene and genome catalogs, substantially enhancing our understanding of their diversity, ecology, and function. Among previously unannotated sequences, we discovered three alkaline phosphatase families that feature unique domain organizations and preserved enzymatic capabilities. These results highlight previously overlooked ecological importance of phosphorus cycling within cold seeps, corroborated by data from porewater geochemistry, metatranscriptomics, and metabolomics. We revealed a previously unrecognized diversity of archaea, including Asgardarchaeota, anaerobic methanotrophic archaea and Thermoproteota, which contribute to organic phosphorus mineralization and inorganic phosphorus solubilization through various mechanisms. Additionally, auxiliary metabolic genes of cold seep viruses primarily encode the PhoR-PhoB regulatory system and PhnCDE transporter, potentially enhancing their hosts' phosphorus utilization. Overall, LucaPCycle are capable of accessing previously 'hidden' sequence spaces for microbial phosphorus cycling and can be applied to various ecosystems.
Collapse
Affiliation(s)
- Chuwen Zhang
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Yong He
- Apsara Lab, Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China
| | - Jieni Wang
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Tengkai Chen
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Federico Baltar
- Fungal and Biogeochemical Oceanography Group, College of Oceanography and Ecological Science, Shanghai Ocean University, Shanghai, China
- Fungal and Biogeochemical Oceanography Group, Department of Functional and Evolutionary Ecology, University of Vienna, Vienna, Austria
| | - Minjie Hu
- Key Laboratory of Humid Sub-tropical Eco-geographical Process of Ministry of Education, Fujian Normal University, Fuzhou, China
- School of Geographical Sciences, Fujian Normal University, Fuzhou, China
| | - Jing Liao
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Xi Xiao
- Key Laboratory of Marine Mineral Resources, Ministry of Natural Resources, Guangzhou Marine Geological Survey, China Geological Survey, Guangzhou, China
| | - Zhao-Rong Li
- Apsara Lab, Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China.
| | - Xiyang Dong
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China.
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China.
| |
Collapse
|
20
|
Li CF, Yan Z, Ge F, Yu X, Zhang J, Zhang M, Yu DJ. TransABseq: A Two-Stage Approach for Predicting Antigen-Antibody Binding Affinity Changes upon Mutation Based on Protein Sequences. J Chem Inf Model 2025; 65:5188-5204. [PMID: 40354482 DOI: 10.1021/acs.jcim.5c00478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2025]
Abstract
The antigen-antibody interaction represents a critical mechanism in host defense, contributing to pathogen neutralization, tumor surveillance, immunotherapy, and in vitro disease detection. Owing to their exceptional specificity, affinity, and selectivity, antibodies have been extensively utilized in the development of clinical diagnostic, therapeutic, and prophylactic strategies. In this study, we propose TransABseq, a novel computational framework specifically designed to predict the effects of missense mutations on antigen-antibody interactions. The model's innovative two-stage architecture enables comprehensive feature analysis: in the first stage, multiple embeddings of protein language models are processed through a Transformer encoder module and a multiscale convolutional module; in the second stage, the XGBOOST model is used to perform quantitative output based on the deeply fused features. A critical advancement contributing to the effectiveness of TransABseq is the deep feature fusion strategy, which reveals the biochemical properties of proteins. By leveraging the multilayer self-attention mechanism of the Transformer to capture complex global dependencies within sequences and mining features at different hierarchical levels through multiscale convolution, the feature abstraction capability of TransABseq is significantly enhanced. We evaluated TransABseq through three distinct cross-validation strategies on two established benchmarks and a newly reconstructed data set. As a result, TransABseq achieved average PCC values of 0.607, 0.843, and 0.794 and average RMSE values of 1.166, 1.314, and 1.337 kcal/mol in 10-fold cross-validation. Furthermore, its robustness and predictive accuracy were validated on blind test data sets, where TransABseq outperformed existing methods, enabling it to attain a PCC of 0.721 and an RMSE of 0.925 kcal/mol. The relevant data and code have been made publicly available for academic research at: https://github.com/cuifengLI/TransABseq.
Collapse
Affiliation(s)
- Cui-Feng Li
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Zihao Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Fang Ge
- State Key Laboratory of Flexible Electronics (LoFE) & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
| | - Xuan Yu
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong 999077, China
| | - Jing Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
21
|
Liu Y, Moretti R, Wang Y, Dong H, Yan B, Bodenheimer B, Derr T, Meiler J. Advancements in Ligand-Based Virtual Screening through the Synergistic Integration of Graph Neural Networks and Expert-Crafted Descriptors. J Chem Inf Model 2025; 65:4898-4905. [PMID: 40365985 DOI: 10.1021/acs.jcim.5c00822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2025]
Abstract
The fusion of traditional chemical descriptors with graph neural networks (GNNs) offers a compelling strategy for enhancing ligand-based virtual screening methodologies. A comprehensive evaluation revealed that the benefits derived from this integrative strategy vary significantly among different GNNs. Specifically, while GCN and SchNet demonstrate pronounced improvements by incorporating descriptors, SphereNet exhibits only marginal enhancement. Intriguingly, despite SphereNet's modest gain, all three models-GCN, SchNet, and SphereNet-achieve comparable performance levels when leveraging this combination strategy. This observation underscores a pivotal insight: sophisticated GNN architectures may be substituted with simpler counterparts without sacrificing efficacy, provided that they are augmented with descriptors. Furthermore, our analysis reveals a set of expert-crafted descriptors' robustness in scaffold-split scenarios, frequently outperforming the combined GNN-descriptor models. Given the critical importance of scaffold splitting in accurately mimicking real-world drug discovery contexts, this finding accentuates an imperative for GNN researchers to innovate models that can adeptly navigate and predict within such frameworks. Our work not only validates the potential of integrating descriptors with GNNs in advancing ligand-based virtual screening but also illuminates pathways for future enhancements in model development and application. Our implementation can be found at https://github.com/meilerlab/gnn-descriptor.
Collapse
Affiliation(s)
- Yunchao Liu
- Department of Computer Science, Vanderbilt University, 2201 West End Ave, Nashville, Tennessee 37235, United States
| | - Rocco Moretti
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 2201 West End Ave, Nashville, Tennessee 37235, United States
| | - Yu Wang
- School of Computer and Data Sciences, University of Oregon, 1585 East 13th Avenue, Eugene, Oregon 97403, United States
| | - Ha Dong
- Department of Neural Science, Amherst College, 220 South Pleasant Street, Amherst, Massachusetts 01002, United States
| | - Bailu Yan
- Department of Biostatistics, Vanderbilt University, 2201 West End Ave, Nashville, Tennessee 37235, United States
| | - Bobby Bodenheimer
- Department of Computer Science, Electrical Engineering and Computer Engineering, Vanderbilt University, 2201 West End Ave, Nashville, Tennessee 37235, United States
| | - Tyler Derr
- Department of Computer Science, Data Science Institute, Vanderbilt University, 2201 West End Ave, Nashville, Tennessee 37235, United States
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 2201 West End Ave, Nashville, Tennessee 37235, United States
- Institute of Drug Discovery, Leipzig University Medical School, Härtelstraße 16-18, Leipzig 04103, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Humboldtstraße 25, Leipzig 04105, Germany
| |
Collapse
|
22
|
Zhang Z, Wei Z, Qin Z, Wang L, Gong J, Shi J, Wu J, Deng Z. Advancing Enzyme Optimal pH Prediction via Retrieved Embedding Data Augmentation. J Chem Inf Model 2025. [PMID: 40418030 DOI: 10.1021/acs.jcim.5c00526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
The optimal enzyme pH is a critical factor that directly influences the catalytic efficiency of the enzymes. Accurate computational prediction of the optimal pH can greatly advance our understanding and design of enzymes for diverse scientific and industrial applications. However, current prediction tools often fall short in terms of accuracy and robustness. In this study, we propose OpHReda, a novel method that significantly improves enzyme optimal pH prediction by leveraging a retrieved embedding data augmentation mechanism. Given an enzyme sequence, OpHReda first retrieves similar sequence embeddings from a preconstructed augmentation database. It then jointly analyzes the original and retrieved embeddings through the Multiple Embedding Alignment transformer to narrow the prediction range. Finally, the calibrator integrates residue-level information with the refined prediction range to make the final prediction. By moving beyond the limitations of single-sequence-based models, OpHReda achieves a 55% improvement in F1-score compared to that of state-of-the-art methods. Extensive ablation studies demonstrate that this enhancement arises from the synergy between our tailored architecture and the augmentation mechanism. Overall, OpHReda offers a promising advancement in enzyme optimal pH prediction and holds potential for downstream applications such as enzyme engineering and rational design.
Collapse
Affiliation(s)
- Ziqi Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Zhisheng Wei
- School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Zhengqiang Qin
- School of Life Sciences and Health Engineering, Jiangnan University, Wuxi 214122, China
| | - Lei Wang
- School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Jinsong Gong
- School of Life Sciences and Health Engineering, Jiangnan University, Wuxi 214122, China
| | - Jinsong Shi
- School of Life Sciences and Health Engineering, Jiangnan University, Wuxi 214122, China
| | - Jing Wu
- School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| |
Collapse
|
23
|
Kroll A, Rousset Y. Recent advances and future trends for protein-small molecule interaction predictions with protein language models. Curr Opin Struct Biol 2025; 93:103070. [PMID: 40414181 DOI: 10.1016/j.sbi.2025.103070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 04/23/2025] [Accepted: 05/04/2025] [Indexed: 05/27/2025]
Abstract
In recent years, the application of natural language models to protein amino acid sequences, referred to as protein language models (PLMs), has demonstrated a significant potential for uncovering hidden patterns related to protein structure, function, and stability. The critical functions of proteins in biological processes often arise through interactions with small molecules; central examples are enzymes, receptors, and transporters. Understanding these interactions is particularly important for drug design, for bioengineering, and for understanding cellular metabolism. In this review, we present state-of-the-art PLMs and explore how they can be integrated with small molecule information to predict protein-small molecule interactions. We present several such prediction tasks and discuss current limitations and potential areas for improvement.
Collapse
Affiliation(s)
- Alexander Kroll
- Heinrich-Heine-University, Universitätsstraße 1, Düsseldorf, 40225, NRW, Germany.
| | - Yvan Rousset
- Heinrich-Heine-University, Universitätsstraße 1, Düsseldorf, 40225, NRW, Germany
| |
Collapse
|
24
|
Jiang Y, Huang S, Chen HF. ActiMut-XGB: Predicting thermodynamic stability of point mutations for CALB with protein language model. Int J Biol Macromol 2025:144609. [PMID: 40414395 DOI: 10.1016/j.ijbiomac.2025.144609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Revised: 05/13/2025] [Accepted: 05/22/2025] [Indexed: 05/27/2025]
Abstract
Predicting the functional impact of single-point mutations on protein residual activity, especially after high-temperature incubation, is critical in protein engineering. We present an innovative machine learning model based on eXtreme Gradient Boosting that leverages protein sequence data to predict thermostability, circumventing the need for three-dimensional structural information. Our model integrates features from the ESM2 language model, physicochemical properties, evolutionary features, and positional features. A key advancement is the use of transfer learning with thermal stability data from various proteins, which enhances prediction accuracy and generalizability. To fine-tune and validate the model, we used experimental data from Candida antarctica lipase B single-point mutants, a widely studied enzyme in biocatalysis and industrial applications. Despite potential limitations of Gibbs free energy values in capturing all factors influencing thermostability, our model represents a significant improvement over traditional approaches, providing valuable insights for protein engineering, enzyme optimization, and therapeutic protein development.
Collapse
Affiliation(s)
- Yuxin Jiang
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shuai Huang
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
25
|
Bushuiev R, Bushuiev A, Samusevich R, Brungs C, Sivic J, Pluskal T. Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS. Nat Biotechnol 2025:10.1038/s41587-025-02663-3. [PMID: 40410407 DOI: 10.1038/s41587-025-02663-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 03/31/2025] [Indexed: 05/25/2025]
Abstract
Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing computational methods for predictions from mass spectra rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our GNPS Experimental Mass Spectra (GeMS) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the emergence of rich representations of molecular structures, which we named Deep Representations Empowering the Annotation of Mass Spectra (DreaMS). Further fine-tuning the neural network yields state-of-the-art performance across a variety of tasks. We make our new dataset and model available to the community and release the DreaMS Atlas-a molecular network of 201 million MS/MS spectra constructed using DreaMS annotations.
Collapse
Affiliation(s)
- Roman Bushuiev
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic
| | - Anton Bushuiev
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic
| | - Raman Samusevich
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic
| | - Corinna Brungs
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Josef Sivic
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic.
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
26
|
Sun X, Wang YG, Shen Y. A multimodal deep learning framework for enzyme turnover prediction with missing modality. Comput Biol Med 2025; 193:110348. [PMID: 40409036 DOI: 10.1016/j.compbiomed.2025.110348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/25/2025]
Abstract
Accurate prediction of the turnover number (kcat), which quantifies the maximum rate of substrate conversion at an enzyme's active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of kcat are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on kcat due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this kcat prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce MMKcat, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, R2, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at https://github.com/ProEcho1/MMKcat.
Collapse
Affiliation(s)
- Xin Sun
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yu Guang Wang
- Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Yiqing Shen
- Johns Hopkins University, Baltimore, 21218, MD, USA.
| |
Collapse
|
27
|
Deng P, Zhang Y, Xu L, Lyu J, Li L, Sun F, Zhang WB, Gao H. Computational discovery and systematic analysis of protein entangling motifs in nature: from algorithm to database. Chem Sci 2025; 16:8998-9009. [PMID: 40271025 PMCID: PMC12013726 DOI: 10.1039/d4sc08649j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 03/29/2025] [Indexed: 04/25/2025] Open
Abstract
Nontrivial protein topology has the potential to revolutionize protein engineering by enabling the manipulation of proteins' stability and dynamics. However, the rarity of topological proteins in nature poses a challenge for their design, synthesis and application, primarily due to the limited number of available entangling motifs as synthetic templates. Discovering these motifs is particularly difficult, as entanglement is a subtle structural feature that is not readily discernible from protein sequences. In this study, we developed a streamlined workflow enabling efficient and accurate identification of structurally reliable and applicable entangling motifs from protein sequences. Through this workflow, we automatically curated a database of 1115 entangling protein motifs from over 100 thousand sequences in the UniProt Knowledgebase. In our database, 73.3% of C2 entangling motifs and 80.1% of C3 entangling motifs exhibited low structural similarity to known protein structures. The entangled structures in the database were categorized into different groups and their functional and biological significance were analyzed. The results were summarized in an online database accessible through a user-friendly web platform, providing researchers with an expanded toolbox of entangling motifs. This resource is poised to significantly advance the field of protein topology engineering and inspire new research directions in protein design and application.
Collapse
Affiliation(s)
- Puqing Deng
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Yuxuan Zhang
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Lianjie Xu
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 P. R. China
| | - Jinyu Lyu
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Linyan Li
- Department of Data Science, City University of Hong Kong Kowloon Hong Kong
| | - Fei Sun
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Wen-Bin Zhang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 P. R. China
- AI for Science (AI4S)-Preferred Program, Shenzhen Graduate School, Peking University Shenzhen 518055 P. R. China
| | - Hanyu Gao
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| |
Collapse
|
28
|
Zhang T, Han Y, Peng Y, Deng Z, Shi W, Xu X, Wu Y, Dong X. The risk of pathogenicity and antibiotic resistance in deep-sea cold seep microorganisms. mSystems 2025:e0157124. [PMID: 40396743 DOI: 10.1128/msystems.01571-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Accepted: 04/23/2025] [Indexed: 05/22/2025] Open
Abstract
Deep-sea cold seeps host high microbial biomass and biodiversity that thrive on hydrocarbon and inorganic compound seepage, exhibiting diverse ecological functions and unique genetic resources. However, potential health risks from pathogenic or antibiotic-resistant microorganisms in these environments remain largely overlooked, especially during resource exploitation and laboratory research. Here, we analyzed 165 metagenomes and 33 metatranscriptomes from 16 global cold seep sites to investigate the diversity and distribution of virulence factors (VFs), antibiotic resistance genes (ARGs), and mobile genetic elements (MGEs). A total of 2,353 VFs are retrieved in 689 metagenome-assembled genomes (MAGs), primarily associated with indirect pathogenesis like adherence. In addition, cold seeps harbor nearly 100,000 ARGs, as important reservoirs, with high-risk ARGs (11.22%) presenting at low abundance. Compared to other environments, microorganisms in cold seeps exhibit substantial differences in VF and ARG counts, with potential horizontal gene transfer facilitating their spread. These virulome and resistome profiles provide valuable insights into the evolutionary and ecological implications of pathogenicity and antibiotic resistance in extreme deep-sea ecosystems. Collectively, these results indicate that cold seep sediments pose minimal public health risks, shedding light on environmental safety in deep-sea resource exploitation and research. IMPORTANCE In the "One Health" era, understanding pathogenicity and antibiotic resistance in vast and largely unexplored regions like deep-sea cold seeps is critical for assessing public health risks. These environments serve as critical reservoirs where resistant and virulent bacteria can persist, adapt, and undergo genetic evolution. The increasing scope of human activities, such as deep-sea mining, is disrupting these previously isolated ecosystems, heightening the potential for microbial exchange between deep-sea communities and human or animal populations. This interaction poses a significant risk for the dissemination of resistance and virulence genes, with potential consequences for global public health and ecosystem stability. This study offers the first comprehensive analysis of virulome, resistome, and mobilome profiles in cold seep microbial communities. While cold seeps act as reservoirs for diverse ARGs, high-risk ARGs are rare, and most VFs were low risk that contribute to ecological functions. These results provide a reference for monitoring the spread of pathogenicity and resistance in extreme ecosystems, informing environmental safety assessments during deep-sea resource exploitation.
Collapse
Affiliation(s)
- Tianxueyu Zhang
- School of Oceanography, Shanghai Jiao Tong University, Shanghai, Shanghai, China
- State Key Laboratory of Submarine Geoscience, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, Zhejiang, China
| | - Yingchun Han
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian, China
| | - Yongyi Peng
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian, China
- Department of Microbiology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria, Australia
| | - Zhaochao Deng
- Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University, Zhoushan, Zhejiang, China
- Ocean Research Center of Zhoushan, Zhejiang University, Zhoushan, Zhejiang, China
| | - Wenqing Shi
- State Key Laboratory for Marine Environmental Science, Institute of Marine Microbes and Ecospheres, College of Ocean and Earth Sciences, Xiamen University College of Ocean and Earth Science, Xiamen, Fujian, China
- RU Marine Symbioses, RD3 Marine Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Schleswig-Holstein, Germany
| | - Xuewei Xu
- State Key Laboratory of Submarine Geoscience, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, Zhejiang, China
| | - Yuehong Wu
- School of Oceanography, Shanghai Jiao Tong University, Shanghai, Shanghai, China
- State Key Laboratory of Submarine Geoscience, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, Zhejiang, China
| | - Xiyang Dong
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian, China
| |
Collapse
|
29
|
Li G, Zhou J, Luo J, Liang C. Accurate prediction of virulence factors using pre-train protein language model and ensemble learning. BMC Genomics 2025; 26:517. [PMID: 40399812 PMCID: PMC12093764 DOI: 10.1186/s12864-025-11694-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2025] [Accepted: 05/09/2025] [Indexed: 05/23/2025] Open
Abstract
BACKGROUND As bacterial pathogens develop increasing resistance to antibiotics, strategies targeting virulence factors (VFs) have emerged as a promising and effective approach for treating bacterial infections. Existing methods mainly relied on sequence similarity, and remote homology relationships cannot be discovered by sequence analysis alone. RESULTS To address this limitation, we developed a protein language model and ensemble learning approach for VF identification (PLMVF). Specifically, we extracted features from protein sequences using ESM-2 and their three-dimensional (3D) structures using ESMFold. We calculated the true TM-score of the proteins based on their 3D structures and trained a TM-predictor model to predict structural similarity, thereby capturing hidden remote homology information within the sequences. Subsequently, we concatenated the sequence-level features extracted by ESM-2 with the predicted TM-score features to form a comprehensive feature set for prediction. Extensive experimental validation demonstrated that PLMVF achieved an accuracy (ACC) of 86.1%, significantly outperforming existing models across multiple evaluation metrics. This study provided an ideal tool for identifying novel targets in the development of anti-virulence therapies, offering promise for the effective prevention and control of pathogenic bacterial infections. CONCLUSIONS The proposed PLMVF model offers an efficient computational approach for VF identification.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information and Software Engineering, East China Jiaotong University, Nanchang, 330013, China.
| | - Jian Zhou
- School of Information and Software Engineering, East China Jiaotong University, Nanchang, 330013, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China.
| |
Collapse
|
30
|
Zhao Y, Lan T, Zhong G, Hagen J, Pan H, Chung WK, Shen Y. A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data. Nat Commun 2025; 16:4670. [PMID: 40393980 PMCID: PMC12092651 DOI: 10.1038/s41467-025-59937-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 05/06/2025] [Indexed: 05/22/2025] Open
Abstract
Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( d ) and a population level (selection coefficient, s ), assuming that in the same gene, missense variants with similar d have similar s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts s and yields new insights from genomic data.
Collapse
Affiliation(s)
- Yige Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY, USA
| | - Tian Lan
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY, USA
| | - Jake Hagen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Hongbing Pan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA.
| |
Collapse
|
31
|
Yang Y, Yan J, Olson R, Jiang X. Comprehensive genomic and evolutionary analysis of biofilm matrix clusters and proteins in the Vibrio genus. mSystems 2025; 10:e0006025. [PMID: 40207939 DOI: 10.1128/msystems.00060-25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Accepted: 03/12/2025] [Indexed: 04/11/2025] Open
Abstract
Vibrio cholerae pathogens cause cholera, an acute diarrheal disease resulting in significant morbidity and mortality worldwide. Biofilms in vibrios enhance their survival in natural ecosystems and facilitate transmission during cholera outbreaks. Critical components of the biofilm matrix include the Vibrio polysaccharides produced by the vps-1 and vps-2 gene clusters and the biofilm matrix proteins encoded in the rbm gene cluster, together comprising the biofilm matrix cluster. However, the biofilm matrix clusters and their evolutionary patterns in other Vibrio species remain underexplored. In this study, we systematically investigated the distribution, diversity, and evolution of biofilm matrix clusters and proteins across the Vibrio genus. Our findings reveal that these gene clusters are sporadically distributed throughout the genus, even appearing in species phylogenetically distant from Vibrio cholerae. Evolutionary analysis of the major biofilm matrix proteins RbmC and Bap1 shows that they are structurally and sequentially related, having undergone structural domain and modular alterations. Additionally, a novel loop-less Bap1 variant was identified, predominantly represented in two phylogenetically distant V. cholerae subspecies clades that share specific gene groups associated with the presence or absence of the protein. Furthermore, our analysis revealed that rbmB, a gene involved in biofilm dispersal, shares a recent common ancestor with Vibriophage tail proteins, suggesting that phages may mimic host functions to evade biofilm-associated defenses. Our study offers a foundational understanding of the diversity and evolution of biofilm matrix clusters in vibrios, laying the groundwork for future biofilm engineering through genetic modification. IMPORTANCE Biofilms help vibrios survive in nature and spread cholera. However, the genes that control biofilm formation in vibrios other than Vibrio cholerae are not well understood. In this study, we analyzed the biofilm matrix gene clusters and proteins across diverse Vibrio species to explore their patterns and evolution. We discovered that these genes are spread across different Vibrio species, including those not closely related to V. cholerae. We also found various forms of key biofilm proteins with different structures. Additionally, we identified genes involved in biofilm dispersal that are related to vibriophage genes, highlighting the role of phages in biofilm development. This study not only provides a foundational understanding of biofilm diversity and evolution in vibrios but also leads to new strategies for engineering biofilms through genetic modification, which is crucial for managing cholera outbreaks and improving the environmental resilience of these bacteria.
Collapse
Affiliation(s)
- Yiyan Yang
- Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Jing Yan
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA
- Quantitative Biology Institute, Yale University, New Haven, Connecticut, USA
| | - Rich Olson
- Department of Molecular Biology and Biochemistry, Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut, USA
| | - Xiaofang Jiang
- Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
32
|
Pan X, Gu Y, Zhou W, Zhang Y. Enhancing Transthyretin Binding Affinity Prediction with a Consensus Model: Insights from the Tox24 Challenge. Chem Res Toxicol 2025; 38:900-908. [PMID: 40285676 DOI: 10.1021/acs.chemrestox.4c00560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2025]
Abstract
Transthyretin (TTR) plays a vital role in thyroid hormone transport and homeostasis in both the blood and target tissues. Interactions between exogenous compounds and TTR can disrupt the function of the endocrine system, potentially causing toxicity. In the Tox24 challenge, we leveraged the data set provided by the organizers to develop a deep learning-based consensus model, integrating sPhysNet, KANO, and GGAP-CPI for predicting TTR binding affinity. Each model utilized distinct levels of molecular information, including 2D topology, 3D geometry, and protein-ligand interactions. Our consensus model achieved favorable performance on the blind test set, yielding an RMSE of 20.8 and ranking fifth among all submissions. Following the release of the blind test set, we incorporated the leaderboard test set into our training data, further reducing the RMSE to 20.6 in an offlineretrospective study. These results demonstrate that combining three regression models across different modalities significantly enhances the predictive accuracy. Furthermore, we employ the standard deviation of the consensus model's ensemble outputs as an uncertainty estimate. Our analysis reveals that both the RMSE and interval error of predictions increase with rising uncertainty, indicating that the uncertainty can serve as a useful measure of prediction confidence. We believe that this consensus model can be a valuable resource for identifying potential TTR binders and predicting their binding affinity in silico. The source code for data preparation, model training, and prediction can be accessed at https://github.com/xiaolinpan/tox24_challenge_submission_yingkai_lab.
Collapse
Affiliation(s)
- Xiaolin Pan
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yaowen Gu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Weijun Zhou
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
33
|
Wang W, Li D, Luo K, Chen B, Hao T, Li X, Guo D, Dong Y, Ning Y. IL-1 Superfamily Across 400+ Species: Therapeutic Targets and Disease Implications. BIOLOGY 2025; 14:561. [PMID: 40427750 DOI: 10.3390/biology14050561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2025] [Revised: 05/07/2025] [Accepted: 05/15/2025] [Indexed: 05/29/2025]
Abstract
An important area of interest for therapeutic development is the IL-1 superfamily, a critical group of immune regulators with profound implications in a variety of disorders. This study clarifies the evolutionary patterns of IL-1 family members by thoroughly analyzing more than 400 animal species, demonstrating their ancient roots that extend back to the earliest vertebrates. Important results show that, although IL-1 ligands expanded significantly over the evolution of mammals, their corresponding receptors remained remarkably structurally conserved. Identifying both lineage-specific adaptations and evolutionarily conserved residues provides vital information for treatment design. These findings point to the possibility of two different therapeutic strategies: addressing species-specific variants may allow for more targeted interventions, whereas focusing on conserved motifs may result in broad-acting treatments. The study also identified less well-known species as useful models for comprehending early immune systems. In addition to advancing our knowledge of the function of the IL-1 family in autoimmune, inflammatory, and carcinogenic illnesses, this research lays the groundwork for the development of more potent targeted therapeutics by creating an evolutionary framework for the IL-1 family.
Collapse
Affiliation(s)
- Weibin Wang
- College of Science, Yunnan Agricultural University, Kunming 650201, China
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Dawei Li
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Kaiyong Luo
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Baozheng Chen
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Tingting Hao
- College of Science, Yunnan Agricultural University, Kunming 650201, China
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Xuzhen Li
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Dazhong Guo
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Yang Dong
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| | - Ya Ning
- College of Science, Yunnan Agricultural University, Kunming 650201, China
- Yunnan Provincial Key Laboratory of Biological Big Data, Yunnan Agricultural University, Kunming 650201, China
| |
Collapse
|
34
|
Percudani R, De Rito C. Predicting Protein Function in the AI and Big Data Era. Biochemistry 2025. [PMID: 40380914 DOI: 10.1021/acs.biochem.5c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2025]
Abstract
It is an exciting time for researchers working to link proteins to their functions. Most techniques for extracting functional information from genomic sequences were developed several years ago, with major progress driven by the availability of big data. Now, groundbreaking advances in deep-learning and AI-based methods have enriched protein databases with three-dimensional information and offer the potential to predict biochemical properties and biomolecular interactions, providing key functional insights. This progress is expected to increase the proportion of functionally bright proteins in databases and deepen our understanding of life at the molecular level.
Collapse
Affiliation(s)
- Riccardo Percudani
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy
| | - Carlo De Rito
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy
| |
Collapse
|
35
|
Vongsouthi V, Georgelin R, Matthews DS, Saunders J, Lee BM, Ton J, Damry AM, Frkic RL, Spence MA, Jackson CJ. Ancestral reconstruction of polyethylene terephthalate degrading cutinases reveals a rugged and unexplored sequence-fitness landscape. SCIENCE ADVANCES 2025; 11:eads8318. [PMID: 40367179 PMCID: PMC12077509 DOI: 10.1126/sciadv.ads8318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Accepted: 04/09/2025] [Indexed: 05/16/2025]
Abstract
The use of protein engineering to generate enzymes for the degradation of polyethylene terephthalate (PET) is a promising route for plastic recycling, yet traditional engineering approaches often fail to explore protein sequence space for optimal enzymes. In this work, we use multiplexed ancestral sequence reconstruction (mASR) to address this, exploring the evolutionary sequence space of PET-degrading cutinases. Using 20 statistically equivalent phylogenies of the bacterial cutinase family, we generated 48 ancestral sequences revealing a wide range of PETase activities, highlighting the value of mASR in uncovering functional variants. Our findings show PETase activity can evolve through multiple pathways involving mutations remote from the active site. Moreover, analyzing the PETase fitness landscape with local ancestral sequence embedding (LASE) revealed that LASE can capture sequence features linked to PETase activity. This work highlights mASR's potential in exploration of sequence space and underscores the use of LASE in readily mapping the protein fitness landscapes.
Collapse
Affiliation(s)
- Vanessa Vongsouthi
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
| | - Rosemary Georgelin
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Dana S. Matthews
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Jake Saunders
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Brendon M. Lee
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | | | - Adam M. Damry
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Rebecca L. Frkic
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Matthew A. Spence
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Colin J. Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
36
|
Chen N, Jiang Z, Xie Z, Zhou S, Zeng T, Jiang S, Zheng Y, Yuan Y, Wu R. An Effective Computational Strategy for UGTs Catalytic Function Prediction. ACS Synth Biol 2025. [PMID: 40377913 DOI: 10.1021/acssynbio.4c00886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2025]
Abstract
The GT-B type glycosyltransferases play a crucial post-modification role in synthesizing natural products, such as triterpenoid and steroidal saponins, renowned for their diverse pharmacological activities. Despite phylogenetic analysis aiding in enzyme family classification, distinguishing substrate specificity between triterpenoid and steroidal saponins, with their highly similar cyclic scaffolds, remains a formidable challenge. Our studies unveil the potential transport tunnels for the glycosyl donor and acceptor in PpUGT73CR1, by molecular dynamics simulations. This revelation leads to a plausible substrate transport mechanism, highlighting the regulatory role of the N-terminal domain (NTD) in glycosyl acceptor binding and transport. Inspired by these structural and mechanistic insights, we further analyze the binding pockets of 44 plant-derived UGTs known to glycosylate triterpenes and sterols. Notably, sterol UGTs are found to harbor aromatic and hydrophobic residues with polar residues typically present at the bottom of the active pocket. Drawing inspiration from the substrate binding and product release mechanism revealed through structure-based molecular modeling, we devised a fast sequence-based method for classifying UGTs using the pre-trained ESM2 protein model. This method involved extracting the NTD features of UGTs and performing PCA clustering analysis, enabling accurate identification of enzyme function, and even differentiation of substrate specificity/promiscuity between structurally similar triterpenoid and steroidal substrates, which is further validated by experiments. This work not only deepens our understanding of substrate binding mechanisms but also provides an effective computational protocol for predicting the catalytic function of unknown UGTs.
Collapse
Affiliation(s)
- Nianhang Chen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Zhennan Jiang
- State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, Experimental Research Center, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Zhekai Xie
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Su Zhou
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Tao Zeng
- School of Pharmaceutical Sciences, Hainan University, Haikou 570100, China
| | - Siqi Jiang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Ying Zheng
- Research Centre of Basic Integrative Medicine, School of Basic Medical Sciences, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province 510006, China
| | - Yuan Yuan
- State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, Experimental Research Center, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
- School of Pharmaceutical Sciences, Hainan University, Haikou 570100, China
| |
Collapse
|
37
|
Solomon BD, Cheatham M, de Guimarães TAC, Duong D, Haendel MA, Hsieh TC, Javanmardi B, Johnson B, Krawitz P, Kruszka P, Laurent T, Lee NC, McWalter K, Michaelides M, Mohnike K, Pontikos N, Guillen Sacoto MJ, Shwetar YJ, Ustach VD, Waikel RL, Woof W. Perspectives on the Current and Future State of Artificial Intelligence in Medical Genetics. Am J Med Genet A 2025:e64118. [PMID: 40375359 DOI: 10.1002/ajmg.a.64118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 04/14/2025] [Accepted: 05/02/2025] [Indexed: 05/18/2025]
Abstract
Artificial intelligence (AI) is rapidly transforming numerous aspects of daily life, including clinical practice and biomedical research. In light of this rapid transformation, and in the context of medical genetics, we assembled a group of leaders in the field to respond to the question about how AI is affecting, and especially how AI will affect, medical genetics. The authors who contributed to this collection of essays intentionally represent different areas of expertise, career stages, and geographies, and include diverse types of clinicians, computer scientists, and researchers. The individual pieces cover a wide range of areas related to medical genetics; we expect that these pieces may provide helpful windows into the ways in which AI is being actively studied, used, and considered in medical genetics.
Collapse
Affiliation(s)
- Benjamin D Solomon
- Medical Genomics Unit, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Morgan Cheatham
- Warren Alpert Medical School of Brown University, Providence, Rhode Island, USA
| | - Thales A C de Guimarães
- Moorfields Eye Hospital National Health Service Foundation Trust, London, UK
- University College London Institute of Ophthalmology, London, UK
- National Institute for Health and Care Research Moorfields Biomedical Research Centre, London, UK
| | - Dat Duong
- Medical Genomics Unit, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Melissa A Haendel
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Tzung-Chien Hsieh
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Behnam Javanmardi
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | | | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | | | | | - Ni-Chung Lee
- Department of Pediatrics and Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | | | - Michel Michaelides
- Moorfields Eye Hospital National Health Service Foundation Trust, London, UK
- University College London Institute of Ophthalmology, London, UK
- National Institute for Health and Care Research Moorfields Biomedical Research Centre, London, UK
| | - Klaus Mohnike
- Children's Hospital, Otto-von-Guericke-University, Magdeburg, Germany
| | - Nikolas Pontikos
- Moorfields Eye Hospital National Health Service Foundation Trust, London, UK
- University College London Institute of Ophthalmology, London, UK
- National Institute for Health and Care Research Moorfields Biomedical Research Centre, London, UK
| | | | - Yousif J Shwetar
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Rebekah L Waikel
- Medical Genomics Unit, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - William Woof
- University College London Institute of Ophthalmology, London, UK
- National Institute for Health and Care Research Moorfields Biomedical Research Centre, London, UK
| |
Collapse
|
38
|
Lin X, Chen Z, Li Y, Ma Z, Fan C, Cao Z, Feng S, Zhang J, Gao YQ. Unifying sequence-structure coding for advanced protein engineering via a multimodal diffusion transformer. Chem Sci 2025:d5sc02055g. [PMID: 40417294 PMCID: PMC12096517 DOI: 10.1039/d5sc02055g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2025] [Accepted: 05/14/2025] [Indexed: 05/27/2025] Open
Abstract
Modern protein engineering demands integrated sequence-structure representations to tackle key challenges in designing, modifying, and evolving proteins for specific functions. While sequence-based methods are promising for generating novel proteins, incorporating structure-oriented information improves the success rate and helps target corresponding functions. Therefore, rather than relying solely on sequence or structure-based approaches, a consensus strategy is essential. Here, we introduce ProTokens, machine-learned "amino acids" derived from structural databases via self-supervised learning, providing a compact yet information-rich representation that bridges sequence and structure modalities. Instead of treating sequences and structures separately, we build PT-DiT, a multimodal diffusion transformer-based model that integrates both into a unified representation, enabling protein engineering in a joint sequence-structure space, streamlining the design process and facilitating the efficient encoding of 3D folds, contextual protein design, sampling of metastable states, and directed evolution for diverse objectives. Therefore, as a unified solution for in silico protein engineering, PT-DiT leverages sequence and structure insights to realize functional protein design.
Collapse
Affiliation(s)
- Xiaohan Lin
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 China
| | - Zhenyu Chen
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 China
| | - Yanheng Li
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 China
| | - Zicheng Ma
- Changping Laboratory Beijing 102200 China
- Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
| | - Chuanliu Fan
- Institute of Artificial Intelligence, Soochow University Suzhou 215006 China
| | - Ziqiang Cao
- Institute of Artificial Intelligence, Soochow University Suzhou 215006 China
| | | | - Jun Zhang
- Changping Laboratory Beijing 102200 China
| | - Yi Qin Gao
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 China
- Changping Laboratory Beijing 102200 China
| |
Collapse
|
39
|
Liang YP, Zhao YL, Yin ZW, Gong XW, Han XL, Wen ML. Conserved Local Structural Motifs in Glycoside Hydrolase Families Facilitate the Discovery of Functional Enzymes. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:11983-11997. [PMID: 40324897 DOI: 10.1021/acs.jafc.4c10554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2025]
Abstract
Glycoside hydrolases (GHs) are vital for natural glycoside biotransformation, especially in enhancing the pharmacological effects of natural products like ginsenosides. In this study, we collected 67 microbial-derived ginsenoside-hydrolyzing enzymes from nine GH families. Despite differences in global structures, the key residues surrounding substrate binding in GH1 and GH3 exhibit conserved structural motifs. Leveraging these motifs, five GH genes from Cellulosimicrobium were cloned, and three enzymes (Cbgl496, Cbgl516, Cbgl766) were characterized. Experimental results demonstrated that Cbgl516, Cbgl766, and Cbgl841 specifically catalyzed the hydrolysis of the β(1-6) glycosidic bond in the C-20 sugar chain of ginsenoside Rb1 to yield Rd. Cbgl496 selectively catalyzed the hydrolysis of β(1-2) glycosidic bonds in the oligosaccharide chains at the C-3 position of ginsenosides Rb1, Rb2, Rb3, and Rc, thereby directionally producing the minor ginsenosides Gy XVII, Compound O, Compound Mx1, and Compound Mc1. Structural analysis of 109,994 GH1/GH3 models from AlphaFold database revealed conserved residues across various organisms, emphasizing evolutionary conservation in the 3D structure of the catalytic core region despite sequence diversity. This study underscores the importance of conserved local structural motifs in GHs, offering insights for functional enzyme screening and understanding enzyme diversity and industrial applications.
Collapse
Affiliation(s)
- Yu-Peng Liang
- National Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650500, Yunnan, China
| | - Ya-Lan Zhao
- National Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650500, Yunnan, China
| | - Zhong-Wei Yin
- National Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650500, Yunnan, China
| | - Xiao-Wei Gong
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd., Kunming 650224, China
| | - Xiu-Lin Han
- National Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650500, Yunnan, China
| | - Meng-Liang Wen
- National Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650500, Yunnan, China
| |
Collapse
|
40
|
Zhu H, Ding Y. Nanobodies: From Discovery to AI-Driven Design. BIOLOGY 2025; 14:547. [PMID: 40427736 DOI: 10.3390/biology14050547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2025] [Revised: 04/25/2025] [Accepted: 05/06/2025] [Indexed: 05/29/2025]
Abstract
Nanobodies, derived from naturally occurring heavy-chain antibodies in camelids (VHHs) and sharks (VNARs), are unique single-domain antibodies that have garnered significant attention in therapeutic, diagnostic, and biotechnological applications due to their small size, stability, and high specificity. This review first traces the historical discovery of nanobodies, highlighting key milestones in their isolation, characterization, and therapeutic development. We then explore their structure-function relationship, emphasizing features like their single-domain architecture and long CDR3 loop that contribute to their binding versatility. Additionally, we examine the growing interest in multiepitope nanobodies, in which binding to different epitopes on the same antigen not only enhances neutralization and specificity but also allows these nanobodies to be used as controllable modules for precise antigen manipulation. This review also discusses the integration of AI in nanobody design and optimization, showcasing how machine learning and deep learning approaches are revolutionizing rational design, humanization, and affinity maturation processes. With continued advancements in structural biology and computational design, nanobodies are poised to play an increasingly vital role in addressing both existing and emerging biomedical challenges.
Collapse
Affiliation(s)
- Haoran Zhu
- State Key Laboratory of Genetics and Development of Complex Phenotypes, School of Life Sciences, Fudan University, Shanghai 200433, China
- Quzhou Fudan Institute, Quzhou 324002, China
| | - Yu Ding
- State Key Laboratory of Genetics and Development of Complex Phenotypes, School of Life Sciences, Fudan University, Shanghai 200433, China
- Quzhou Fudan Institute, Quzhou 324002, China
| |
Collapse
|
41
|
Maurino VG. Next generation technologies for protein structure determination: challenges and breakthroughs in plant biology applications. JOURNAL OF PLANT PHYSIOLOGY 2025; 310:154522. [PMID: 40382917 DOI: 10.1016/j.jplph.2025.154522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2025] [Revised: 05/13/2025] [Accepted: 05/14/2025] [Indexed: 05/20/2025]
Abstract
Advancements in structural biology have significantly deepened our understanding of plant proteins, which are central to critical biological functions such as photosynthesis, metabolism, signal transduction, and structural architechture. Gaining insights into their structures is crucial for unraveling their functions and mechanisms, which in turn has profound implications for agriculture, biotechnology, and environmental sustainability. Traditional methods in protein structural biology often fall short in addressing large protein assemblies and membrane proteins, and, in particular the dynamics and structural features of proteins in the native cellular context. This paper explores how next-generation technologies are transforming the field of plant protein structural biology, offering powerful tools to overcome longstanding obstacles and enabling remarkable scientific breakthroughs. Key technologies discussed include advanced X-ray crystallography, Cryo-Electron microscopy, Nuclear Magnetic Resonance spectroscopy, Cross-linking mass spectrometry, and Artificial Intelligence-driven approaches. These technologies are examined in terms of their challenges, innovations, and application with particular emphasis on their relevance to plant systems. Future directions in plant protein structural biology are also discussed. Although technical details are not covered in depth, readers are referred to the primary literature for more comprehensive information.
Collapse
Affiliation(s)
- Veronica G Maurino
- Molecular Plant Physiology, Institute for Cellular and Molecular Botany (IZMB), University of Bonn, Kirschallee 1, 53115, Bonn, Germany.
| |
Collapse
|
42
|
Huang R, Qiu W, Xiao X, Lin W. iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks. PLoS One 2025; 20:e0320817. [PMID: 40359455 PMCID: PMC12074593 DOI: 10.1371/journal.pone.0320817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 02/24/2025] [Indexed: 05/15/2025] Open
Abstract
Protein-DNA interactions play a crucial role in cellular biology, essential for maintaining life processes and regulating cellular functions. We propose a method called iProtDNA-SMOTE, which utilizes non-equilibrium graph neural networks along with pre-trained protein language models to predict DNA binding residues. This approach effectively addresses the class imbalance issue in predicting protein-DNA binding sites by leveraging unbalanced graph data, thus enhancing model's generalization and specificity. We trained the model on two datasets, TR646 and TR573, and conducted a series of experiments to evaluate its performance. The model achieved AUC values of 0.850, 0.896, and 0.858 on the independent test datasets TE46, TE129, and TE181, respectively. These results indicate that iProtDNA-SMOTE outperforms existing methods in terms of accuracy and generalization for predicting DNA binding sites, offering reliable and effective predictions to minimize errors. The model has been thoroughly validated for its ability to predict protein-DNA binding sites with high reliability and precision. For the convenience of the scientific community, the benchmark datasets and codes are publicly available at https://github.com/primrosehry/iProtDNA-SMOTE.
Collapse
Affiliation(s)
- Ruiyan Huang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen Jiangxi, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen Jiangxi, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen Jiangxi, China
- School of Information Engineering, Jingxi Art & Ceramics Technology Institute, Jingdezhen Jiangxi, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen Jiangxi, China
| |
Collapse
|
43
|
Zhang X, Tseo Y, Bai Y, Chen F, Uhler C. Prediction of protein subcellular localization in single cells. Nat Methods 2025:10.1038/s41592-025-02696-1. [PMID: 40360932 DOI: 10.1038/s41592-025-02696-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 04/09/2025] [Indexed: 05/15/2025]
Abstract
The subcellular localization of a protein is important for its function, and its mislocalization is linked to numerous diseases. Existing datasets capture limited pairs of proteins and cell lines, and existing protein localization prediction models either miss cell-type specificity or cannot generalize to unseen proteins. Here we present a method for Prediction of Unseen Proteins' Subcellular localization (PUPS). PUPS combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images. We demonstrate that the protein sequence input enables generalization to unseen proteins, and the cellular image input captures single-cell variability, enabling cell-type-specific predictions. Experimental validation shows that PUPS can predict protein localization in newly performed experiments outside the Human Protein Atlas used for training. Collectively, PUPS provides a framework for predicting differential protein localization across cell lines and single cells within a cell line, including changes in protein localization driven by mutations.
Collapse
Affiliation(s)
- Xinyi Zhang
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yitong Tseo
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yunhao Bai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
| | - Caroline Uhler
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
44
|
Ito J, Strange A, Liu W, Joas G, Lytras S, Sato K. A protein language model for exploring viral fitness landscapes. Nat Commun 2025; 16:4236. [PMID: 40360496 PMCID: PMC12075601 DOI: 10.1038/s41467-025-59422-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 04/22/2025] [Indexed: 05/15/2025] Open
Abstract
Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype-fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype-fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk.
Collapse
Affiliation(s)
- Jumpei Ito
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
- International Research Center for Infectious Diseases, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
| | - Adam Strange
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Wei Liu
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Gustav Joas
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- Division of Immunology and Respiratory Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Spyros Lytras
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Kei Sato
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
- International Research Center for Infectious Diseases, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK.
- Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
- International Vaccine Design Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
- Collaboration Unit for Infection, Joint Research Center for Human Retrovirus Infection, Kumamoto University, Kumamoto, Japan.
| |
Collapse
|
45
|
Wozniak S, Janson G, Feig M. Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning. J Chem Theory Comput 2025; 21:4830-4845. [PMID: 40270304 PMCID: PMC12080100 DOI: 10.1021/acs.jctc.4c01682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 03/14/2025] [Accepted: 04/15/2025] [Indexed: 04/25/2025]
Abstract
Machine learning has emerged as a promising approach for predicting molecular properties of proteins, as it addresses limitations of experimental and traditional computational methods. Here, we introduce GSnet, a graph neural network (GNN) trained to predict physicochemical and geometric properties including solvation-free energies, diffusion constants, and hydrodynamic radii, based on three-dimensional protein structures. By leveraging transfer learning, pretrained GSnet embeddings were adapted to predict solvent-accessible surface area (SASA) and residue-specific pKa values, achieving high accuracy and generalizability. Notably, GSnet outperformed existing protein embeddings for SASA prediction and a locally charge-aware variant, aLCnet, approached the accuracy of simulation-based and empirical methods for pKa prediction. Our GNN framework demonstrated robustness across diverse data sets, including intrinsically disordered peptides, and scalability for high-throughput applications. These results highlight the potential of GNN-based embeddings and transfer learning to advance protein structure analysis, providing a foundation for integrating predictive models into proteome-wide studies and structural biology pipelines.
Collapse
Affiliation(s)
- Spencer Wozniak
- Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Giacomo Janson
- Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael Feig
- Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
46
|
Pandey A, Chen W, Keten S. COLOR: A Compositional Linear Operation-Based Representation of Protein Sequences for Identification of Monomer Contributions to Properties. J Chem Inf Model 2025; 65:4320-4333. [PMID: 40272990 DOI: 10.1021/acs.jcim.5c00205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2025]
Abstract
The properties of biological materials like proteins and nucleic acids are largely determined by their primary sequence. Certain segments in the sequence strongly influence specific functions, but identifying these segments, or so-called motifs, is challenging due to the complexity of sequential data. While deep learning (DL) models can accurately capture sequence-property relationships, the degree of nonlinearity in these models limits the assessment of monomer contributions to a property─a critical step in identifying key motifs. Recent advances in explainable AI (XAI) offer attention and gradient-based methods for estimating monomeric contributions. However, these methods are primarily applied to classification tasks, such as binding site identification, where they achieve limited accuracy (40-45%) and rely on qualitative evaluations. To address these limitations, we introduce a DL model with interpretable steps, enabling direct tracing of monomeric contributions. Inspired by the masking technique commonly used in vision and natural language processing domains, we propose a new metric ( I ) for quantitative analysis on datasets mainly containing distinct properties of anticancer peptides (ACP), antimicrobial peptides (AMP), and collagen. Our model exhibits 22% higher explainability than the gradient and attention-based state-of-the-art models, recognizes critical motifs (RRR, RRI, and RSS) that significantly destabilize ACPs, and identifies motifs in AMPs that are 50% more effective in converting non-AMPs to AMPs. These findings highlight the potential of our model in guiding mutation strategies for designing protein-based biomaterials.
Collapse
Affiliation(s)
- Akash Pandey
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, Illinois 60208, United States
| |
Collapse
|
47
|
Gao L, Zhang Y, Ge F, Li S, Guo Y, Song J, Yu DJ. Structure-Directed Pan-Specific T-Cell Receptor-Peptide-Major Histocompatibility Complex Interaction Prediction. J Chem Inf Model 2025; 65:4674-4686. [PMID: 40297927 DOI: 10.1021/acs.jcim.5c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
T-cell receptors (TCRs) play a pivotal role in the adaptive immune system, and understanding their antigen recognition mechanism remains a critical area of research. With the increasing availability of binding and interaction data between TCRs and peptide-major histocompatibility complexes (pMHCs), data-driven computational methods are emerging as powerful tools with significant potential for advancement. In this study, we collected and curated comprehensive sequence and structure data sets of TCRs from human CD8+ T-cells and cognate epitopes presented by MHC class I molecules. We developed two innovative computational frameworks: SG-TPMI, a lightweight, extensible, and structure-guided model for predicting TCR-pMHC binding specificity, and Seq/Struct-TCS, a pair of models (sequence-based and structure-based) for predicting contact sites within TCR-pMHC complexes. Notably, we directly integrated MHC-I alpha helices (or pseudosequences) and structural information on the protein complex into the prediction models. Our comprehensive modeling approach enabled quantitative investigations of TCR-pMHC interaction mechanisms, empowering SG-TPMI and Struct-TCS to achieve performances comparable to those of state-of-the-art methods. Furthermore, our results highlight the necessity of CDR1 and CDR2 loops as well as MHC restriction in pan-specific TCR-pMHC interaction prediction, providing new insights into TCR recognition. In summary, we not only propose SG-TPMI as an effective computational method for predicting TCR-pMHC binary interactions but also introduce the Seq/Struct-TCS design for predicting TCR interacting sites with peptide or MHC alpha helices.
Collapse
Affiliation(s)
- Letao Gao
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Yumeng Zhang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, Nanjing 210003,China
| | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
48
|
Cui XC, Zheng Y, Liu Y, Yuchi Z, Yuan YJ. AI-driven de novo enzyme design: Strategies, applications, and future prospects. Biotechnol Adv 2025; 82:108603. [PMID: 40368118 DOI: 10.1016/j.biotechadv.2025.108603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 04/22/2025] [Accepted: 05/10/2025] [Indexed: 05/16/2025]
Abstract
Enzymes are indispensable for biological processes and diverse applications across industries. While top-down modification strategies, such as directed evolution, have achieved remarkable success in optimizing existing enzymes, bottom-up de novo enzyme design has emerged as a transformative approach for engineering novel enzymes with customized catalytic functions, independent of natural templates. Recent advancements in artificial intelligence (AI) and computational power have significantly accelerated this field, enabling breakthroughs in enzyme engineering. These technologies facilitate the rapid generation of enzyme structures and amino acid sequences optimized for specific functions, thereby enhancing design efficiency. They also support functional validation and activity optimization, improving the catalytic performance, stability, and robustness of de novo designed enzymes. This review highlights recent advancements in AI-driven de novo enzyme design, discusses strategies for validation and optimization, and examines the challenges and future prospects of integrating these technologies into enzyme development.
Collapse
Affiliation(s)
- Xi-Chen Cui
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 30072, PR China; Frontiers Science Center for Synthetic Biology(Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin 300072, PR China
| | - Yan Zheng
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 30072, PR China; Frontiers Science Center for Synthetic Biology(Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin 300072, PR China
| | - Ye Liu
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 30072, PR China; Frontiers Science Center for Synthetic Biology(Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin 300072, PR China; School of Pharmaceutical Science and Technology, Tianjin University, Tianjin 300072, PR China
| | - Zhiguang Yuchi
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 30072, PR China; Frontiers Science Center for Synthetic Biology(Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin 300072, PR China; School of Pharmaceutical Science and Technology, Tianjin University, Tianjin 300072, PR China.
| | - Ying-Jin Yuan
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 30072, PR China; Frontiers Science Center for Synthetic Biology(Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin 300072, PR China.
| |
Collapse
|
49
|
Brown LM, Tax G, Acera Mateos P, de Weck A, Foresto S, Robertson T, Jalud F, Ajuyah P, Barahona P, Mao J, Dolman MEM, Wong M, Mayoh C, Cowley MJ, Lau LMS, Sadras T, Ekert PG. A novel TRKB-activating internal tandem duplication characterizes a new mechanism of receptor tyrosine kinase activation. NPJ Precis Oncol 2025; 9:137. [PMID: 40348911 PMCID: PMC12065843 DOI: 10.1038/s41698-025-00928-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 04/28/2025] [Indexed: 05/14/2025] Open
Abstract
Precision medicine programs like the Zero Childhood Cancer Program perform comprehensive molecular analysis of patient tumors, enabling detection of novel structural variants that may be cryptic to standard techniques. Identification of these variants can impact individual patient treatment, and beyond this establish new mechanisms of oncogenic activation. We have identified a novel internal tandem duplication (ITD) in the receptor tyrosine kinase (RTK), NTRK2, in a patient with FOXR2-activated CNS neuroblastoma. The ITD spans exons 10-13 of NTRK2 encoding the transmembrane domain. NTRK2 ITD is transforming and sensitive to TRK inhibition. In silico structural predictions suggested the duplication of an alpha-helix region and juxtaposed tyrosine residues that play a role in facilitating autophosphorylation. Consistent with this, mutation of these residues inhibited cellular transformation. This is the first report of an ITD spanning the transmembrane domain of an RTK, characterizing an additional mechanism by which RTKs are activated in cancer.
Collapse
Affiliation(s)
- Lauren M Brown
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Gabor Tax
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Pablo Acera Mateos
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Antoine de Weck
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Steve Foresto
- Queensland Children's Hospital, Brisbane, QLD, Australia
| | | | - Fatimah Jalud
- Peter MacCallum Cancer Centre, Parkville, VIC, Australia
- The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC, Australia
| | - Pamela Ajuyah
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Paulette Barahona
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Jie Mao
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - M Emmy M Dolman
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Marie Wong
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Chelsea Mayoh
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Mark J Cowley
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Loretta M S Lau
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia
- Kids Cancer Centre, Sydney Children's Hospital, Randwick, NSW, Australia
| | - Teresa Sadras
- Peter MacCallum Cancer Centre, Parkville, VIC, Australia
- The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC, Australia
| | - Paul G Ekert
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia.
- School of Clinical Medicine, UNSW Sydney, Sydney, NSW, Australia.
- Peter MacCallum Cancer Centre, Parkville, VIC, Australia.
- The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC, Australia.
- University of New South Wales Centre for Childhood Cancer Research, UNSW Sydney, Sydney, NSW, Australia.
| |
Collapse
|
50
|
Ding N, Jiang Y, Lee S, Cheng Z, Ran X, Ding Y, Ge R, Zhang Y, Yang ZJ. Enzyme miniaturization: Revolutionizing future biocatalysts. Biotechnol Adv 2025; 82:108598. [PMID: 40354901 DOI: 10.1016/j.biotechadv.2025.108598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 04/05/2025] [Accepted: 05/09/2025] [Indexed: 05/14/2025]
Abstract
Enzyme miniaturization offers a transformative approach to overcome limitations posed by the large size of conventional enzymes in industrial, therapeutic, and diagnostic applications. However, the evolutionary optimization of enzymes for activity has not inherently favored compact structures, creating challenges for modern applications requiring smaller catalysts. In this review, we surveyed the advantages of miniature enzymes, including enhanced expressivity, folding efficiency, thermostability, and resistance to proteolysis. We described the applications of miniature enzymes as industrial catalysts, therapeutic agents, and diagnostic elements. We highlighted strategies such as genome mining, rational design, random deletion, and de novo design for achieving enzyme miniaturization, integrating both computational and experimental techniques. By investigating these approaches, we aim to provide a framework for advancing enzyme engineering, emphasizing the unique potential of miniature enzymes to revolutionize biocatalysis, gene therapy, and biosensing technologies.
Collapse
Affiliation(s)
- Ning Ding
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States; Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, United States.
| | - Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States; Department of Chemistry and California Institute for Quantitative Biosciences, University of California-Berkeley, Berkeley, CA 94720, United States
| | - Sangsin Lee
- Department of Genetics, Stanford University, Stanford, CA 94305, United States
| | - Zihao Cheng
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States
| | - Yujing Ding
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Robbie Ge
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States
| | - Yifei Zhang
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States; Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, United States.
| |
Collapse
|