1
|
Chi X, Chen R, Chen R, Xu Y, Deng Y, Yang X, Pan Z, Xu X, Pan Y, Li Q, Zhou P, Huang W. Discovery and characterization of novel FAK inhibitors for breast cancer therapy via hybrid virtual screening, biological evaluation and molecular dynamics simulations. Bioorg Chem 2025; 159:108400. [PMID: 40163988 DOI: 10.1016/j.bioorg.2025.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Revised: 03/19/2025] [Accepted: 03/19/2025] [Indexed: 04/02/2025]
Abstract
Focal adhesion kinase (FAK) is a critical drug target implicated in various disease pathways, including hematological malignancies and breast cancer. Therefore, identifying FAK inhibitors with novel scaffolds could offer new opportunities for developing effective therapeutic compounds. Herein, we disclosed the discovery of a new backbone inhibitor of FAK using an "internal" database, employing a structure-based high-transparency permeability virtual screening (HTVS) and a DeepDock algorithm based on geometric deep learning. Subsequently, molecular docking was conducted at different precisions to identify 10 compounds for further evaluation of biological activity. Ultimately, compound 4, a pyrimidin-4-amine derivative, demonstrated inhibitory activity against FAK and breast cancer cells, further supporting its potential as a FAK inhibitor. Moreover, molecular dynamics simulations were carried out to gain more detailed insights into the binding mechanism between compound 4 and FAK to guide subsequent structural optimization.
Collapse
Affiliation(s)
- Xinglong Chi
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China; Center of Safety Evaluation and Research, Hangzhou Medical College, Hangzhou 310053, PR China
| | - Runmei Chen
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China; School of Pharmacy, Hangzhou Medical College, Hangzhou 310058, PR China
| | - Roufen Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Yingxuan Xu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Yaru Deng
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China; Center of Safety Evaluation and Research, Hangzhou Medical College, Hangzhou 310053, PR China
| | - Xinle Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, PR China; College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, PR China
| | - Zhichao Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Xiangwei Xu
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China; School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou 325035, PR China
| | - Youlu Pan
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China; Center of Safety Evaluation and Research, Hangzhou Medical College, Hangzhou 310053, PR China
| | - Qin Li
- School of Pharmacy, Hangzhou Medical College, Hangzhou 310058, PR China.
| | - Peng Zhou
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China.
| | - Wenhai Huang
- Affiliated Yongkang First People's Hospital and School of Pharmaceutical Sciences, Hangzhou Medical College, Hangzhou 310058, PR China; Center of Safety Evaluation and Research, Hangzhou Medical College, Hangzhou 310053, PR China.
| |
Collapse
|
2
|
Sun Q, Wang H, Xie J, Wang L, Mu J, Li J, Ren Y, Lai L. Computer-Aided Drug Discovery for Undruggable Targets. Chem Rev 2025. [PMID: 40423592 DOI: 10.1021/acs.chemrev.4c00969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2025]
Abstract
Undruggable targets are those of therapeutical significance but challenging for conventional drug design approaches. Such targets often exhibit unique features, including highly dynamic structures, a lack of well-defined ligand-binding pockets, the presence of highly conserved active sites, and functional modulation by protein-protein interactions. Recent advances in computational simulations and artificial intelligence have revolutionized the drug design landscape, giving rise to innovative strategies for overcoming these obstacles. In this review, we highlight the latest progress in computational approaches for drug design against undruggable targets, present several successful case studies, and discuss remaining challenges and future directions. Special emphasis is placed on four primary target categories: intrinsically disordered proteins, protein allosteric regulation, protein-protein interactions, and protein degradation, along with discussion of emerging target types. We also examine how AI-driven methodologies have transformed the field, from applications in protein-ligand complex structure prediction and virtual screening to de novo ligand generation for undruggable targets. Integration of computational methods with experimental techniques is expected to bring further breakthroughs to overcome the hurdles of undruggable targets. As the field continues to evolve, these advancements hold great promise to expand the druggable space, offering new therapeutic opportunities for previously untreatable diseases.
Collapse
Affiliation(s)
- Qi Sun
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
| | - Hanping Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Liying Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Junxi Mu
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yuhao Ren
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
3
|
Xiao J, Hu G, Zhou X, Zheng Y, Li J. TIDGN: A Transfer Learning Framework for Predicting Interactions of Intrinsically Disordered Proteins with High Conformational Dynamics. J Chem Inf Model 2025; 65:4866-4877. [PMID: 40360271 DOI: 10.1021/acs.jcim.5c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2025]
Abstract
Interactions between intrinsically disordered proteins (IDPs) are crucial for biological processes, such as intracellular liquid-liquid phase separation (LLPS). Experiments (e.g., NMR) and simulations used to study IDP interactions encounter a variety of difficulties, highlighting the necessity to develop relevant machine learning methods. However, reliable machine learning methods face the challenge resulting from the scarcity of available training data. In this work, we propose a transfer learning-based invariant geometric dynamic graph model, named TIDGN, for predicting IDP interactions. The model consists of a pretraining task module and a downstream task module. The pretraining task module learns the dynamic structural encoding of IDP monomers, which is then used by the downstream task module for interaction site prediction. The IDP monomer structure data set and the IDP interaction event data set are constructed using all-atom molecular dynamics (MD) simulations. The transfer learning strategy effectively enhances the model's performance. Both homotypic interactions and heterotypic interactions between two IDPs are considered in this work. Interestingly, TIDGN performs well for the heterotypic interaction prediction. Additionally, the feature ablation analysis emphasizes the importance of invariant geometric graph features. Taken together, our work demonstrates that the integration of transfer learning and the invariant geometric graph network offers a promising approach for addressing data scarcity challenges of IDP interaction prediction.
Collapse
Affiliation(s)
- Jing Xiao
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guorong Hu
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Xiaozhou Zhou
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yuchuan Zheng
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Jingyuan Li
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| |
Collapse
|
4
|
Gainza P, Bunker RD, Townson SA, Castle JC. Machine learning to predict de novo protein-protein interactions. Trends Biotechnol 2025:S0167-7799(25)00158-1. [PMID: 40425414 DOI: 10.1016/j.tibtech.2025.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/23/2025] [Accepted: 04/23/2025] [Indexed: 05/29/2025]
Abstract
Advances in machine learning for structural biology have dramatically enhanced our capacity to predict protein-protein interactions (PPIs). Here, we review recent developments in the computational prediction of PPIs, particularly focusing on innovations that enable interaction predictions that have no precedence in nature, termed de novo. We discuss novel machine learning algorithms for PPI prediction, including approaches based on co-folding and atomic graphs. We further highlight methods that learn from molecular surfaces, which can predict PPIs not found in nature including interactions induced by small molecules. Finally, we explore the emerging biotechnological applications enabled by these predictive capabilities, including the prediction of antibody-antigen complexes and molecular glue-induced PPIs, and discuss their potential to empower drug discovery and protein engineering.
Collapse
Affiliation(s)
- Pablo Gainza
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland.
| | - Richard D Bunker
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland
| | - Sharon A Townson
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland
| | - John C Castle
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland.
| |
Collapse
|
5
|
Diaz-Rovira AM, Lotze J, Hoffmann G, Pallara C, Molina A, Coburger I, Gloser-Bräunig M, Meysing M, Zwarg M, Díaz L, Guallar V, Bosse-Doenecke E, Roda S. Efficient Design of Affilin ® Protein Binders for HER3. Int J Mol Sci 2025; 26:4683. [PMID: 40429825 PMCID: PMC12112719 DOI: 10.3390/ijms26104683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2025] [Revised: 05/08/2025] [Accepted: 05/12/2025] [Indexed: 05/29/2025] Open
Abstract
Engineered scaffold-based proteins that bind to concrete targets with high affinity offer significant advantages over traditional antibodies in theranostic applications. Their development often relies on display methods, where large libraries of variants are physically contacted with the desired target protein and pools of binding variants can be selected. Herein, we use a novel combined artificial intelligence/physics-based computational framework and phage display approach to obtain ubiquitin based Affilin® proteins targeting the human epidermal growth factor receptor 3 (HER3) extracellular domain, a relevant tumor target. As traditional antibodies against the receptor have failed so far, we sought to provide molecules in a smaller more versatile format to cover the medical need in HER3 related diseases. We demonstrate that the developed in silico pipeline can generate de novo Affilin® proteins binding the biochemical HER3 target using a small training set of <1000 sequences. The classical phage display yielded primary candidates with low nanomolar affinities to the biochemical target and HER3-expressing cells. The latter could be further optimized by phage display and computational maturation alike. These combined efforts resulted in four HER3 ligands with high affinity, cell binding, and serum stability with theranostic potential.
Collapse
Affiliation(s)
- Anna M. Diaz-Rovira
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain; (A.M.D.-R.); (V.G.)
- Doctoral Program in Theoretical Chemistry and Computational Modelling, Universitat de Barcelona, 08028 Barcelona, Spain
| | - Jonathan Lotze
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Gregor Hoffmann
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Chiara Pallara
- Nostrum Biodiscovery S.L., 08029 Barcelona, Spain; (C.P.); (A.M.); (L.D.)
| | - Alexis Molina
- Nostrum Biodiscovery S.L., 08029 Barcelona, Spain; (C.P.); (A.M.); (L.D.)
| | - Ina Coburger
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Manja Gloser-Bräunig
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Maren Meysing
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Madlen Zwarg
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Lucía Díaz
- Nostrum Biodiscovery S.L., 08029 Barcelona, Spain; (C.P.); (A.M.); (L.D.)
| | - Victor Guallar
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain; (A.M.D.-R.); (V.G.)
- Nostrum Biodiscovery S.L., 08029 Barcelona, Spain; (C.P.); (A.M.); (L.D.)
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| | - Eva Bosse-Doenecke
- Navigo Proteins GmbH, 06120 Halle, Germany; (J.L.); (G.H.); (I.C.); (M.G.-B.); (M.M.); (M.Z.)
| | - Sergi Roda
- Nostrum Biodiscovery S.L., 08029 Barcelona, Spain; (C.P.); (A.M.); (L.D.)
| |
Collapse
|
6
|
Xia R, Li W, Cheng Y, Xie L, Xu X. Molecular surfaces modeling: Advancements in deep learning for molecular interactions and predictions. Biochem Biophys Res Commun 2025; 763:151799. [PMID: 40239539 DOI: 10.1016/j.bbrc.2025.151799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/20/2025] [Accepted: 04/10/2025] [Indexed: 04/18/2025]
Abstract
Molecular surface analysis can provide a high-dimensional, rich representation of molecular properties and interactions, which is crucial for enabling powerful predictive modeling and rational molecular design across diverse scientific and technological domains. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in accelerating molecular discovery and innovation. The integration of AI techniques with molecular surface analysis has opened up new frontiers, allowing researchers to uncover hidden patterns, relationships, and design principles that were previously elusive. By leveraging the complementary strengths of molecular surface representations and advanced AI algorithms, scientists can now explore chemical space more efficiently, optimize molecular properties with greater precision, and drive transformative advancements in areas like drug development, materials engineering, and catalysis. In this review, we aim to provide an overview of recent advancements in the field of molecular surface analysis and its integration with AI techniques. These AI-driven approaches have led to significant advancements in various downstream tasks, including interface site prediction, protein-protein interaction prediction, surface-centric molecular generation and design.
Collapse
Affiliation(s)
- Renjie Xia
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Wei Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yi Cheng
- College of Engineering, Lishui University, Lishui, 323000, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China.
| |
Collapse
|
7
|
Sutherland CA, Stevens DM, Seong K, Wei W, Krasileva KV. The resistance awakens: Diversity at the DNA, RNA, and protein levels informs engineering of plant immune receptors from Arabidopsis to crops. THE PLANT CELL 2025; 37:koaf109. [PMID: 40344182 PMCID: PMC12118082 DOI: 10.1093/plcell/koaf109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2025] [Revised: 04/17/2025] [Accepted: 04/21/2025] [Indexed: 05/11/2025]
Abstract
Plants rely on germline-encoded, innate immune receptors to sense pathogens and initiate the defense response. The exponential increase in quality and quantity of genomes, RNA-seq datasets, and protein structures has underscored the incredible biodiversity of plant immunity. Arabidopsis continues to serve as a valuable model and theoretical foundation of our understanding of wild plant diversity of immune receptors, while expansion of study into agricultural crops has also revealed distinct evolutionary trajectories and challenges. Here, we provide the classical context for study of both intracellular nucleotide-binding, leucine-rich repeat receptors and surface-localized pattern recognition receptors at the levels of DNA sequences, transcriptional regulation, and protein structures. We then examine how recent technology has shaped our understanding of immune receptor evolution and informed our ability to efficiently engineer resistance. We summarize current literature and provide an outlook on how researchers take inspiration from natural diversity in bioengineering efforts for disease resistance from Arabidopsis and other model systems to crops.
Collapse
Affiliation(s)
- Chandler A Sutherland
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Danielle M Stevens
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Kyungyong Seong
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Wei Wei
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
8
|
Tao Y, Lu Y, Yu B, Wang Y. Molecular glue meets antibody: next-generation antibody-drug conjugates. Trends Pharmacol Sci 2025:S0165-6147(25)00068-9. [PMID: 40345868 DOI: 10.1016/j.tips.2025.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2025] [Revised: 04/03/2025] [Accepted: 04/15/2025] [Indexed: 05/11/2025]
Abstract
Antibody-drug conjugates (ADCs) have revolutionized oncology by enabling the delivery of cytotoxic agents. However, persistent limitations in payload diversity and emerging drug-resistance mechanisms have spurred investigations into innovative payload modalities. Molecular glue-antibody conjugates (MACs), which utilize molecular glues as payloads, represent a groundbreaking advance in this field. By leveraging the catalytic, event-driven nature of molecular glues, MACs offer enhanced efficacy, reduced off-target effects, and an improved therapeutic index. Two MACs are now in clinical trials. This review explores MAC mechanisms, advances, and potential to surpass traditional ADCs and molecular glues, while addressing development challenges and future directions.
Collapse
Affiliation(s)
- Yiran Tao
- Department of Pulmonary and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Ying Lu
- Department of Pulmonary and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Bin Yu
- Tianjian Laboratory of Advanced Biomedical Sciences, Institute of Advanced Biomedical Sciences, College of Chemistry, Zhengzhou University, Zhengzhou 450001, China.
| | - Yuxi Wang
- Department of Pulmonary and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China; Frontiers Medical Center, Tianfu Jincheng Laboratory, Chengdu, 610093, Sichuan, China.
| |
Collapse
|
9
|
Pistos M, Li G, Lin W, Shen D, Rekik I. Predicting infant brain connectivity with federated multi-trajectory GNNs using scarce data. Med Image Anal 2025; 102:103541. [PMID: 40107118 DOI: 10.1016/j.media.2025.103541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 01/02/2025] [Accepted: 03/03/2025] [Indexed: 03/22/2025]
Abstract
The understanding of the convoluted evolution of infant brain networks during the first postnatal year is pivotal for identifying the dynamics of early brain connectivity development. Thanks to the valuable insights into the brain's anatomy, existing deep learning frameworks focused on forecasting the brain evolution trajectory from a single baseline observation. While yielding remarkable results, they suffer from three major limitations. First, they lack the ability to generalize to multi-trajectory prediction tasks, where each graph trajectory corresponds to a particular imaging modality or connectivity type (e.g., T1-w MRI). Second, existing models require extensive training datasets to achieve satisfactory performance which are often challenging to obtain. Third, they do not efficiently utilize incomplete time series data. To address these limitations, we introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network. Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets. As a result, we enhance the performance of each hospital's local generative model, while preserving data privacy. The three key innovations of FedGmTE-Net++ are: (i) presenting the first federated learning framework specifically designed for brain multi-trajectory evolution prediction in a data-scarce environment, (ii) incorporating an auxiliary regularizer in the local objective function to exploit all the longitudinal brain connectivity within the evolution trajectory and maximize data utilization, (iii) introducing a two-step imputation process, comprising a preliminary K-Nearest Neighbours based precompletion followed by an imputation refinement step that employs regressors to improve similarity scores and refine imputations. Our comprehensive experimental results showed the outperformance of FedGmTE-Net++ in brain multi-trajectory prediction from a single baseline graph in comparison with benchmark methods. Our source code is available at https://github.com/basiralab/FedGmTE-Net-plus.
Collapse
Affiliation(s)
- Michalis Pistos
- BASIRA Lab, Imperial-X and Department of Computing, Imperial College London, London, UK
| | - Gang Li
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Weili Lin
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dinggang Shen
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China; Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200230, China; Shanghai Clinical Research and Trial Center, Shanghai, 201210, China
| | - Islem Rekik
- BASIRA Lab, Imperial-X and Department of Computing, Imperial College London, London, UK.
| |
Collapse
|
10
|
Lai L, Geng J, Duan H, Chen S, Huang L, Yu J. A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites. J Comput Biol 2025; 32:520-536. [PMID: 40000026 DOI: 10.1089/cmb.2024.0804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2025] Open
Abstract
Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.
Collapse
Affiliation(s)
- Lingwei Lai
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Jing Geng
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Haochen Duan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Siyuan Chen
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Lvwen Huang
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Jiantao Yu
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
11
|
Shao D, Zou Y, Ma L, Yi S. Multiscale and global-local U-Net for protein-protein interaction site prediction. Comput Biol Chem 2025; 118:108485. [PMID: 40306099 DOI: 10.1016/j.compbiolchem.2025.108485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 03/18/2025] [Accepted: 04/21/2025] [Indexed: 05/02/2025]
Abstract
Precise prediction of protein-protein interaction sites (PPIS) is fundamental to deciphering cellular mechanisms and accelerating therapeutic discovery. Despite significant advancements in computational approaches, current methods frequently fail to integrate multiscale features that simultaneously capture global context and local interactions. We present Multiscale and Global-Local U-Net for Protein-Protein Interaction Site Prediction (MGU-PPIS), a novel architecture designed to address this critical limitation. Our model leverages a U-Net framework with implemented multi-level pooling to extract comprehensive multiscale features. Within each scale, we synergistically combine Transformer networks, Graph Convolutional Networks (GCNs), and Graph Attention Networks (GATs) to simultaneously capture global patterns and local structural motifs. We implement Laplacian positional encoding to effectively represent global protein structural characteristics. In our framework, proteins are conceptualized as graph structures where individual residues function as nodes and their spatial relationships define edges. The model processes information through an innovative two-stage U-Net architecture, where output features from the initial stage serve as refined inputs for the subsequent stage. This dual-stage design, coupled with our graph-based representation, enables MGU-PPIS to extract a rich spectrum of multiscale features encompassing both global context and local interactions at each scale. Comprehensive experimental validation demonstrates that MGU-PPIS significantly outperforms state-of-the-art methods in predictive accuracy. Beyond introducing a novel computational strategy for PPIS prediction, our work establishes a foundation for advances in protein functional analysis and structure-based drug design.
Collapse
Affiliation(s)
- Dangguo Shao
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
| | - Yuyang Zou
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
| | - Lei Ma
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China.
| | - Sanli Yi
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China.
| |
Collapse
|
12
|
Yuan M, Zou Z, Luo Y, Jiang J, Hu W. QMe14S: A Comprehensive and Efficient Spectral Data Set for Small Organic Molecules. J Phys Chem Lett 2025; 16:3972-3979. [PMID: 40223330 DOI: 10.1021/acs.jpclett.5c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2025]
Abstract
Developing machine learning protocols for molecular simulations requires comprehensive and efficient data sets. Here we introduce the QMe14S data set, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, and Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties, including energy, atomic charge, atomic force, dipole moment, quadrupole moment, polarizability, octupole moment, first hyperpolarizability, and Hessian. At the same level, we obtained the harmonic IR, Raman, and NMR spectra. Furthermore, we conducted ab initio molecular dynamics simulations to generate dynamic configurations and extract nonequilibrium properties, including energy, forces, and Hessians. By leveraging our E(3)-equivariant message-passing neural network (DetaNet), we demonstrated that models trained on QMe14S outperform those trained on the previously developed QM9S data set in simulating molecular spectra. The QMe14S data set thus serves as a comprehensive benchmark for molecular simulations, offering valuable insights into structure-property relationships.
Collapse
Affiliation(s)
- Mingzhi Yuan
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan 250353, China
| | - Zihan Zou
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan 250353, China
| | - Yi Luo
- Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, 230026 Hefei, China
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China
| | - Wei Hu
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan 250353, China
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
13
|
Zhou P, Wang J, Li C, Wang Z, Liu Y, Sun S, Lin J, Wei L, Cai X, Lai H, Liu W, Wang L, Liu Y, Zeng X. Instruction multi-constraint molecular generation using a teacher-student large language model. BMC Biol 2025; 23:105. [PMID: 40269927 PMCID: PMC12020078 DOI: 10.1186/s12915-025-02200-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 03/27/2025] [Indexed: 04/25/2025] Open
Abstract
BACKGROUND While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. RESULTS We introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the "teachers." To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these "teachers," enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules that meet complex property requirements described in natural language across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts. CONCLUSIONS TSMMG presents an effective model for multi-constraint molecular generation using natural language. This framework is not only applicable to drug discovery but also serves as a reference for other related fields.
Collapse
Affiliation(s)
- Peng Zhou
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
- AI for Life Sciences Lab, Tencent, Shenzhen, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon, 21983, Seoul, Korea
| | - Chunyan Li
- School of Informatics, Yunnan Normal University, Kunming, 650500, Yunnan, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yiping Liu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
- Shanghai AI Laboratory, Shanghai, 200232, China
| | - Jianxin Lin
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, China
- School of Informatics, Xiamen University, Xiamen, China
| | - Xibao Cai
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Houtim Lai
- AI for Life Sciences Lab, Tencent, Shenzhen, China
| | - Wei Liu
- AI for Life Sciences Lab, Tencent, Shenzhen, China
| | - Longyue Wang
- Alibaba International Digital Commerce, Hangzhou, China.
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
14
|
Sang C, Shu J, Wang K, Xia W, Wang Y, Sun T, Xu X. The prediction of RNA-small molecule binding sites in RNA structures based on geometric deep learning. Int J Biol Macromol 2025; 310:143308. [PMID: 40268011 DOI: 10.1016/j.ijbiomac.2025.143308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 04/15/2025] [Accepted: 04/16/2025] [Indexed: 04/25/2025]
Abstract
Biological interactions between RNA and small-molecule ligands play a crucial role in determining the specific functions of RNA, such as catalysis and folding, and are essential for guiding drug design in the medical field. Accurately predicting the binding sites of ligands within RNA structures is therefore of significant importance. To address this challenge, we introduced a computational approach named RLBSIF (RNA-Ligand Binding Surface Interaction Fingerprints) based on geometric deep learning. This model utilizes surface geometric features, including shape index and distance-dependent curvature, combined with chemical features represented by atomic charge, to comprehensively characterize RNA-ligand interactions through MaSIF-based surface interaction fingerprints. Additionally, we employ the ResNet18 network to analyze these fingerprints for identifying ligand binding pockets. Trained on 440 binding pockets, RLBSIF achieves an overall pocket-level classification accuracy of 90 %. Through a full-space enumeration method, it can predict binding sites at nucleotide resolution. In two independent tests, RLBSIF outperformed competing models, demonstrating its efficacy in accurately identifying binding sites within complex molecular structures. This method shows promise for drug design and biological product development, providing valuable insights into RNA-ligand interactions and facilitating the design of novel therapeutic interventions. For access to the related source code, please visit RLBSIF on GitHub (https://github.com/ZUSTSTTLAB/RLBSIF).
Collapse
Affiliation(s)
- Chunjiang Sang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Jiasai Shu
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Kang Wang
- School of Physics, Nanjing University, Nanjing 210093, China
| | - Wentao Xia
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Yan Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Tingting Sun
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China.
| |
Collapse
|
15
|
Xiong S, Cai J, Shi H, Cui F, Zhang Z, Wei L. UMPPI: Unveiling Multilevel Protein-Peptide Interaction Prediction via Language Models. J Chem Inf Model 2025; 65:3789-3799. [PMID: 40077987 DOI: 10.1021/acs.jcim.4c02365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
Protein-peptide interactions are essential to cellular processes and disease mechanisms. Identifying protein-peptide binding residues is critical for understanding peptide function and advancing drug discovery. However, experimental methods are costly and time-intensive, while existing computational approaches often predict interactions or binding residues separately, lack effective feature integration, or rely heavily on limited high-quality structural data. To address these challenges, we propose UMPPI (Unveiling Multilevel Protein-Peptide Interaction), a multiobjective framework based on the pretrained protein language model ESM2. UMPPI simultaneously predicts binary protein-peptide interactions and binding residues on both peptides and proteins through a multiobjective optimization strategy. By integrating ESM2 to encode sequences and extract latent structural information, UMPPI bridges the gap between sequence-based and structure-based methods. Extensive experiments demonstrated that UMPPI successfully captured binary interactions between peptides and proteins and identified the binding residues on peptides and proteins. UMPPI can serve as a useful tool for protein-peptide interaction prediction and identification of critical binding residues, thereby facilitating the peptide drug discovery process.
Collapse
Affiliation(s)
- Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Jiajie Cai
- School of Software, Shandong University, Jinan 250101, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361005, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
16
|
Yang D, Kuang L, Hu A. Edge-enhanced interaction graph network for protein-ligand binding affinity prediction. PLoS One 2025; 20:e0320465. [PMID: 40198678 PMCID: PMC11977954 DOI: 10.1371/journal.pone.0320465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 02/18/2025] [Indexed: 04/10/2025] Open
Abstract
Protein-ligand interactions are crucial in drug discovery. Accurately predicting protein-ligand binding affinity is essential for screening potential drugs. Graph neural networks have proven highly effective in modeling spatial relationships and three-dimensional structures within intermolecular. In this paper, we introduce a graph neural network-based model named EIGN to predict protein-ligand binding affinity. The model consists of three main components: the normalized adaptive encoder, the molecular information propagation module, and the output module. Experimental results indicate that EIGN achieves root mean squared error of 1.126 and Pearson correlation coefficient of 0.861 on CASF-2016. Additionally, our model outperforms state-of-the-art methods on CASF-2013, CASF-2016, and the CSAR-NRC set, showing exceptional accuracy and robust generalization ability. To further validate the effectiveness of EIGN, we conducted several experiments, including ablation studies, feature importance analysis, data similarity analysis, and others, to evaluate its performance and applicability.
Collapse
Affiliation(s)
| | | | - An Hu
- Xiangtan University, Xiangtan, Hunan, China
| |
Collapse
|
17
|
Fang A, Zhang Z, Zhou A, Zitnik M. ATOMICA: Learning Universal Representations of Intermolecular Interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.02.646906. [PMID: 40291688 PMCID: PMC12026499 DOI: 10.1101/2025.04.02.646906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize in a single type of interaction, such as protein-ligand or protein-protein binding. This siloed approach prevents generalization across biomolecular classes and limits the ability to model interaction interfaces systematically. We introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across diverse biomolecular modalities, including small molecules, metal ions, amino acids, and nucleic acids. ATOMICA uses a self-supervised denoising and masking objective to train on 2,037,972 interaction complexes and generate hierarchical embeddings at the levels of atoms, chemical blocks, and molecular interfaces. The model generalizes across molecular classes and recovers shared physicochemical features without supervision. Its latent space captures compositional and chemical similarities across interaction types and follows scaling laws that improve representation quality with increasing biomolecular data modalities. We apply ATOMICA to construct five modality-specific interfaceome networks, termed ATOMICAN et s, which connect proteins based on interaction similarity with ions, small molecules, nucleic acids, lipids, and proteins. These networks identify disease pathways across 27 conditions and predict disease-associated proteins in autoimmune neuropathies and lymphoma. Finally, we use ATOMICA to annotate the dark proteome-proteins lacking known structure or function-by predicting 2,646 previously uncharacterized ligand-binding sites. These include putative zinc finger motifs and transmembrane cytochrome subunits, demonstrating that ATOMICA enables systematic annotation of molecular interactions across the proteome.
Collapse
|
18
|
Wang J, Zhang P, Yu Y, Yi Y, Jiang Y, Hu S. Discovery of novel STAT3 inhibitors with anti-breast cancer activity: structure-based virtual screening, molecular dynamics and biological evaluation. RSC Med Chem 2025:d5md00053j. [PMID: 40270994 PMCID: PMC12013508 DOI: 10.1039/d5md00053j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Accepted: 04/05/2025] [Indexed: 04/25/2025] Open
Abstract
Triple negative breast cancer (TNBC) is a fatal type of breast cancer due to its high recurrence and metastatic potential. Persistent activation of signal transducer and activator of transcription 3 (STAT3) is crucial for TNBC progression, making it an attractive drug target. In this study, two new STAT3 inhibitors with significant anti-TNBC activity, d2 and d10, were identified from 1.67 million candidates through a rapid and cost-effective strategy integrating high-throughput virtual screening (HTVS), molecular mechanics/generalized born surface area (MM/GBSA), and binding pose metadynamics (BPMD) methods. In-depth mechanistic studies revealed that d2 and d10 significantly inhibited cell proliferation and colony formation, induced G1 phase arrest, and reduced migration and invasion of TNBC cells. Moreover, both d2 and d10 were found to inhibit the nuclear translocation and phosphorylation of STAT3. Molecular dynamics simulations further indicated that both compounds can stably bind to STAT3 in the SH2 domain. Additionally, protein-ligand interaction fingerprints (IFPs) of the screened compounds from HTVS were generated to better guide the design and structural optimization of STAT3 inhibitors.
Collapse
Affiliation(s)
- Jinhui Wang
- Donghai Laboratory Zhoushan Zhejiang 316021 China
| | - Peijie Zhang
- National Engineering Research Center for Marine Aquaculture, Zhejiang Ocean University Zhoushan Zhejiang 316022 China
| | - Yalin Yu
- Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University Zhoushan Zhejiang 316021 China
| | - Yan Yi
- Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University Zhoushan Zhejiang 316021 China
| | - Yongjun Jiang
- School of Food and Pharmacy, Zhejiang Ocean University Zhoushan Zhejiang 316022 China
| | - Shiwei Hu
- National Engineering Research Center for Marine Aquaculture, Zhejiang Ocean University Zhoushan Zhejiang 316022 China
| |
Collapse
|
19
|
Tahmid MT, Hasan AKMM, Bayzid MS. TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning. Commun Biol 2025; 8:568. [PMID: 40185915 PMCID: PMC11971327 DOI: 10.1038/s42003-025-07534-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 01/13/2025] [Indexed: 04/07/2025] Open
Abstract
Identifying DNA-binding proteins and their binding residues is critical for understanding diverse biological processes, but conventional experimental approaches are slow and costly. Existing machine learning methods, while faster, often lack accuracy and struggle with data imbalance, relying heavily on evolutionary profiles like PSSMs and HMMs derived from multiple sequence alignments (MSAs). These dependencies make them unsuitable for orphan proteins or those that evolve rapidly. To address these challenges, we introduce TransBind, an alignment-free deep learning framework that predicts DNA-binding proteins and residues directly from a single primary sequence, eliminating the need for MSAs. By leveraging features from pre-trained protein language models, TransBind effectively handles the issue of data imbalance and achieves superior performance. Extensive evaluations using diverse experimental datasets and case studies demonstrate that TransBind significantly outperforms state-of-the-art methods in terms of both accuracy and computational efficiency. TransBind is available as a web server at https://trans-bind-web-server-frontend.vercel.app/ .
Collapse
Affiliation(s)
- Md Toki Tahmid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - A K M Mehedi Hasan
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
| |
Collapse
|
20
|
Wang C, Alamdari S, Domingo-Enrich C, Amini AP, Yang KK. Toward deep learning sequence-structure co-generation for protein design. Curr Opin Struct Biol 2025; 91:103018. [PMID: 39983410 DOI: 10.1016/j.sbi.2025.103018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 01/28/2025] [Accepted: 01/29/2025] [Indexed: 02/23/2025]
Abstract
Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions. While the majority of today's models focus on generating either sequences or structures, emerging co-generation methods promise more accurate and controllable protein design, ideally achieved by modeling both modalities simultaneously. Here we review recent advances in deep generative models for protein design, with a particular focus on sequence-structure co-generation methods. We describe the key methodological and evaluation principles underlying these methods, highlight recent advances from the literature, and discuss opportunities for continued development of sequence-structure co-generation approaches.
Collapse
Affiliation(s)
- Chentong Wang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, 310024, China
| | | | | | - Ava P Amini
- Microsoft Research, Cambridge, MA, 02142, USA
| | | |
Collapse
|
21
|
Teoh YC, Noor MS, Aghakhani S, Girton J, Hu G, Chowdhury R. Viral escape-inspired framework for structure-guided dual bait protein biosensor design. PLoS Comput Biol 2025; 21:e1012964. [PMID: 40233103 PMCID: PMC12021294 DOI: 10.1371/journal.pcbi.1012964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 04/24/2025] [Accepted: 03/14/2025] [Indexed: 04/17/2025] Open
Abstract
A generalizable computational platform, CTRL-V (Computational TRacking of Likely Variants), is introduced to design selective binding (dual bait) biosensor proteins. The iteratively evolving receptor binding domain (RBD) of SARS-CoV-2 spike protein has been construed as a model dual bait biosensor which has iteratively evolved to distinguish and selectively bind to human entry receptors and avoid binding neutralizing antibodies. Spike RBD prioritizes mutations that reduce antibody binding while enhancing/ retaining binding with the ACE2 receptor. CTRL-V's through iterative design cycles was shown to pinpoint 20% (of the 39) reported SARS-CoV-2 point mutations across 30 circulating, infective strains as responsible for immune escape from commercial antibody LY-CoV1404. CTRL-V successfully identifies ~70% (five out of seven) single point mutations (371F, 373P, 440K, 445H, 456L) in the latest circulating KP.2 variant and offers detailed structural insights to the escape mechanism. While other data-driven viral escape variant predictor tools have shown promise in predicting potential future viral variants, they require massive amounts of data to bypass the need for physics of explicit biochemical interactions. Consequently, they cannot be generalized for other protein design applications. The publicly availably viral escape data was leveraged as in vivo anchors to streamline a computational workflow that can be generalized for dual bait biosensor design tasks as exemplified by identifying key mutational loci in Raf kinase that enables it to selectively bind Ras and Rap1a GTP. We demonstrate three versions of CTRL-V which use a combination of integer optimization, stochastic sampling by PyRosetta, and deep learning-based ProteinMPNN for structure-guided biosensor design.
Collapse
Affiliation(s)
- Yee Chuen Teoh
- Department of Computer Science, Iowa State University, Ames, Iowa, United States of America
| | - Mohammed Sakib Noor
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, United States of America
| | - Sina Aghakhani
- School of Industrial Engineering and Management, Oklahoma State University, Stillwater, Oklahoma, United States of America
| | - Jack Girton
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, United States of America
| | - Guiping Hu
- School of Industrial Engineering and Management, Oklahoma State University, Stillwater, Oklahoma, United States of America
| | - Ratul Chowdhury
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, United States of America
- Nanovaccine Institute, Iowa State University, Ames, Iowa, United States of America
| |
Collapse
|
22
|
Xia W, Shu J, Sang C, Wang K, Wang Y, Sun T, Xu X. The prediction of RNA-small-molecule ligand binding affinity based on geometric deep learning. Comput Biol Chem 2025; 115:108367. [PMID: 39904171 DOI: 10.1016/j.compbiolchem.2025.108367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 01/11/2025] [Accepted: 01/26/2025] [Indexed: 02/06/2025]
Abstract
Small molecule-targeted RNA is an emerging technology that plays a pivotal role in drug discovery and inhibitor design, with widespread applications in disease treatment. Consequently, predicting RNA-small-molecule ligand interactions is crucial. With advancements in computer science and the availability of extensive biological data, deep learning methods have shown great promise in this area, particularly in efficiently predicting RNA-small molecule binding sites. However, few computational methods have been developed to predict RNA-small molecule binding affinities. Meanwhile, most of these approaches rely primarily on sequence or structural representations. Molecular surface information, vital for RNA and small molecule interactions, has been largely overlooked. To address these gaps, we propose a geometric deep learning method for predicting RNA-small molecule binding affinity, named RNA-ligand Surface Interaction Fingerprinting (RLASIF). In this study, we create RNA-ligand interaction fingerprints from the geometrical and chemical features present on molecular surface to characterize binding affinity. RLASIF outperformed other computational methods across ten different test sets from PDBbind NL2020. Compared to the second-best method, our approach improves performance by 10.01 %, 6.67 %, 2.01 % and 1.70 % on four evaluation metrics, indicating its effectiveness in capturing key features influencing RNA-ligand binding strength. Additionally, RLASIF holds potential for virtual screening of potential ligands for RNA and predicting small molecule binding nucleotides within RNA structures.
Collapse
Affiliation(s)
- Wentao Xia
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Jiasai Shu
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Chunjiang Sang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Kang Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Yan Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Tingting Sun
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China.
| |
Collapse
|
23
|
Meng L, Wei L, Wu R. MVGNN-PPIS: A novel multi-view graph neural network for protein-protein interaction sites prediction based on Alphafold3-predicted structures and transfer learning. Int J Biol Macromol 2025; 300:140096. [PMID: 39848362 DOI: 10.1016/j.ijbiomac.2025.140096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/04/2025] [Accepted: 01/17/2025] [Indexed: 01/25/2025]
Abstract
Protein-protein interactions (PPI) are crucial for understanding numerous biological processes and pathogenic mechanisms. Identifying interaction sites is essential for biomedical research and targeted drug development. Compared to experimental methods, accurate computational approaches for protein-protein interaction sites (PPIS) prediction can save significant time and costs. In this study, we propose a novel model named MVGNN-PPIS. To the best of our knowledge, it is the first to utilize predicted structures generated by AlphaFold3, and combined with transfer learning techniques, for predicting PPIS. This approach addresses the limitations of traditional methods that depend on native protein structures and multiple sequence alignments (MSA). Additionally, we introduced a multi-view graph framework based on two types of graph structures: the k-nearest neighbor graph and the adjacency matrix. By alternately employing a Graph Transformer and Graph Convolutional Networks (GCN) to aggregate node information, this framework effectively captures both local and global dependencies of each residue in the predicted structures, thereby significantly enhancing the model's sensitivity to binding sites. This framework further integrates direction, distances and angular information between the 3D coordinates of side-chain atom centroids to construct a relative coordinate system, generating enhanced edge features that ensure the model's equivariance to molecular translations and rotations in space. During training, the Focal Loss function is employed to effectively address the class imbalance in the dataset. Experimental results demonstrate that MVGNN outperforms the current state-of-the-art methods across multiple PPIS benchmark datasets. To further validate the model's generalization capability, we extended MVGNN to the domain of predicting protein-nucleic acid interaction sites, where it also achieved superior performance.
Collapse
Affiliation(s)
- Lu Meng
- College of Information Science and Engineering, Northeastern University, China.
| | - Lishuai Wei
- College of Information Science and Engineering, Northeastern University, China
| | - Rina Wu
- College of Information Science and Engineering, Northeastern University, China
| |
Collapse
|
24
|
Khan S, Noor S, Awan HH, Iqbal S, AlQahtani SA, Dilshad N, Ahmad N. Deep-ProBind: binding protein prediction with transformer-based deep learning model. BMC Bioinformatics 2025; 26:88. [PMID: 40121399 PMCID: PMC11929993 DOI: 10.1186/s12859-025-06101-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 03/04/2025] [Indexed: 03/25/2025] Open
Abstract
Binding proteins play a crucial role in biological systems by selectively interacting with specific molecules, such as DNA, RNA, or peptides, to regulate various cellular processes. Their ability to recognize and bind target molecules with high specificity makes them essential for signal transduction, transport, and enzymatic activity. Traditional experimental methods for identifying protein-binding peptides are costly and time-consuming. Current sequence-based approaches often struggle with accuracy, focusing too narrowly on proximal sequence features and ignoring structural data. This study presents Deep-ProBind, a powerful prediction model designed to classify protein binding sites by integrating sequence and structural information. The proposed model employs a transformer and evolutionary-based attention mechanism, i.e., Bidirectional Encoder Representations from Transformers (BERT) and Pseudo position specific scoring matrix -Discrete Wavelet Transform (PsePSSM -DWT) approach to encode peptides. The SHapley Additive exPlanations (SHAP) algorithm selects the optimal hybrid features, and a Deep Neural Network (DNN) is then used as the classification algorithm to predict protein-binding peptides. The performance of the proposed model was evaluated in comparison with traditional Machine Learning (ML) algorithms and existing models. Experimental results demonstrate that Deep-ProBind achieved 92.67% accuracy with tenfold cross-validation on benchmark datasets and 93.62% accuracy on independent samples. The Deep-ProBind outperforms existing models by 3.57% on training data and 1.52% on independent tests. These results demonstrate Deep-ProBind's reliability and effectiveness, making it a valuable tool for researchers and a potential resource in pharmacological studies, where peptide binding plays a critical role in therapeutic development.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KPK, Pakistan
| | - Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Hamid Hussain Awan
- Department of Computer Science, Rawalpindi Women University, Rawalpindi, 46300, Punjab, Pakistan
| | - Shehryar Iqbal
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, UK
| | - Salman A AlQahtani
- New Emerging Technologies and 5g Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Naqqash Dilshad
- Department of Computer Science & Engineering, Sejong University, Seoul, 05006, South Korea
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
25
|
Mahajan SP, Dávila-Hernández FA, Ruffolo JA, Gray JJ. How well do contextual protein encodings learn structure, function, and evolutionary context? Cell Syst 2025; 16:101201. [PMID: 40043698 PMCID: PMC12026297 DOI: 10.1016/j.cels.2025.101201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 09/23/2024] [Accepted: 01/30/2025] [Indexed: 03/22/2025]
Abstract
In proteins, the optimal residue at any position is determined by its structural, evolutionary, and functional contexts-much like how a word may be inferred from its context in language. We trained masked label prediction models to learn representations of amino acid residues in different contexts. We focus questions on evolution and structural flexibility and whether and how contextual encodings derived through pretraining and fine-tuning may improve representations for specialized contexts. Sequences sampled from our learned representations fold into template structure and reflect sequence variations seen in related proteins. For flexible proteins, sampled sequences traverse the full conformational space of the native sequence, suggesting that plasticity is encoded in the template structure. For protein-protein interfaces, generated sequences replicate wild-type binding energies across diverse interfaces and binding strengths in silico. For the antibody-antigen interface, fine-tuning recapitulate conserved sequence patterns, while pretraining on general contexts improves sequence recovery for the hypervariable H3 loop. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
| | - Fátima A Dávila-Hernández
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey A Ruffolo
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA; Johns Hopkins Data Science and AI Institute, Baltimore, MD, USA.
| |
Collapse
|
26
|
Nie J, Zhang X, Hu Z, Wang W, Schroer MA, Ren J, Svergun D, Chen A, Yang P, Zeng AP. A globular protein exhibits rare phase behavior and forms chemically regulated orthogonal condensates in cells. Nat Commun 2025; 16:2449. [PMID: 40069234 PMCID: PMC11897184 DOI: 10.1038/s41467-025-57886-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 03/05/2025] [Indexed: 03/15/2025] Open
Abstract
Proteins with chemically regulatable phase separation are of great interest in the fields of biomolecular condensates and synthetic biology. Intrinsically disordered proteins (IDPs) are the dominating building blocks of biomolecular condensates which often lack orthogonality and small-molecule regulation desired to create synthetic biomolecular condensates or membraneless organelles (MLOs). Here, we discover a well-folded globular protein, lipoate-protein ligase A (LplA) from E. coli involved in lipoylation of enzymes essential for one-carbon and energy metabolisms, that exhibits structural homomeric oligomerization and a rare LCST-type reversible phase separation in vitro. In both E. coli and human U2OS cells, LplA can form orthogonal condensates, which can be specifically dissolved by its natural substrate, the small molecule lipoic acid and its analogue lipoamide. The study of LplA phase behavior and its regulatability expands our understanding and toolkit of small-molecule regulatable protein phase behavior with impacts on biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Jinglei Nie
- Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang, China
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Xinyi Zhang
- Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang, China
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Hamburg, Germany
| | - Zhijuan Hu
- Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang, China
- Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
- Zhejiang Key Laboratory of Intelligent Low-Carbon Synthetic Biology, School of Engineering, Westlake University, Hangzhou, Zhejiang, China
| | - Wei Wang
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Hamburg, Germany
| | - Martin A Schroer
- Nanoparticle Process Technology (NPPT), University of Duisburg-Essen, Duisburg, Germany
- European Molecular Biology Laboratory (EMBL), Hamburg Outstation c/o DESY, Hamburg, Germany
| | - Jie Ren
- State Key Laboratory for Biology of Plant Diseases and Insect Pests/Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety, Ministry of Agriculture, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Dmitri Svergun
- European Molecular Biology Laboratory (EMBL), Hamburg Outstation c/o DESY, Hamburg, Germany
- BIOSAXS GmbH, Hamburg, Germany
| | - Anyang Chen
- Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang, China
| | - Peiguo Yang
- Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang, China
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - An-Ping Zeng
- Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang, China.
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China.
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Hamburg, Germany.
- Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China.
- Zhejiang Key Laboratory of Intelligent Low-Carbon Synthetic Biology, School of Engineering, Westlake University, Hangzhou, Zhejiang, China.
| |
Collapse
|
27
|
Grassmann G, Di Rienzo L, Ruocco G, Miotto M, Milanetti E. Compact Assessment of Molecular Surface Complementarities Enhances Neural Network-Aided Prediction of Key Binding Residues. J Chem Inf Model 2025; 65:2695-2709. [PMID: 39982412 PMCID: PMC11898074 DOI: 10.1021/acs.jcim.4c02286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 02/09/2025] [Accepted: 02/13/2025] [Indexed: 02/22/2025]
Abstract
Predicting interactions between proteins is fundamental for understanding the mechanisms underlying cellular processes, since protein-protein complexes are crucial in physiological conditions but also in many diseases, for example by seeding aggregates formation. Despite the many advancements made so far, the performance of docking protocols is deeply dependent on their capability to identify binding regions. From this, the importance of developing low-cost and computationally efficient methods in this field. We present an integrated novel protocol mainly based on compact modeling of protein surface patches via sets of orthogonal polynomials to identify regions of high shape/electrostatic complementarity. By incorporating both hydrophilic and hydrophobic contributions, we define new binding matrices, which serve as effective inputs for training a neural network. In this work, we propose a new Neural Network (NN)-based architecture, Core Interacting Residues Network (CIRNet), which achieves a performance in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC) of approximately 0.87 in identifying pairs of core interacting residues on a balanced data set. In a blind search for core interacting residues, CIRNet distinguishes them from random decoys with an ROC AUC of 0.72. We test this protocol to enhance docking algorithms by filtering the proposed poses, addressing one of the still open problems in computational biology. Notably, when applied to the top ten models from three widely used docking servers, CIRNet improves docking outcomes, significantly reducing the average RMSD between the selected poses and the native state. Compared to another state-of-the-art tool for rescaling docking poses, CIRNet more efficiently identified the worst poses generated by the three docking servers under consideration and achieved superior rescaling performance in two cases.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, P.Le A. Moro 5, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
- Department
of Physics, Sapienza University, Piazzale Aldo Moro 5, Rome 00185, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
- Department
of Physics, Sapienza University, Piazzale Aldo Moro 5, Rome 00185, Italy
| |
Collapse
|
28
|
Li Y, Tian Z, Nan X, Zhang S, Zhou Q, Lu S. HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction. Brief Bioinform 2025; 26:bbaf079. [PMID: 40037640 PMCID: PMC11879409 DOI: 10.1093/bib/bbaf079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 02/10/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Protein-protein interactions play a fundamental role in biological systems. Accurate detection of protein-protein interaction sites (PPIs) remains a challenge. And, the methods of PPIs prediction based on biological experiments are expensive. Recently, a lot of computation-based methods have been developed and made great progress. However, current computational methods only focus on one form of protein, using only protein spatial conformation or primary sequence. And, the protein's natural hierarchical structure is ignored. RESULTS In this study, we propose a novel network architecture, HSSPPI, through hierarchical and spatial-sequential modeling of protein for PPIs prediction. In this network, we represent protein as a hierarchical graph, in which a node in the protein is a residue (residue-level graph) and a node in the residue is an atom (atom-level graph). Moreover, we design a spatial-sequential block for capturing complex interaction relationships from spatial and sequential forms of protein. We evaluate HSSPPI on public benchmark datasets and the predicting results outperform the comparative models. This indicates the effectiveness of hierarchical protein modeling and also illustrates that HSSPPI has a strong feature extraction ability by considering spatial and sequential information simultaneously. AVAILABILITY AND IMPLEMENTATION The code of HSSPPI is available at https://github.com/biolushuai/Hierarchical-Spatial-Sequential-Modeling-of-Protein.
Collapse
Affiliation(s)
- Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, Zhejiang, China
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
- Zhongyuan Intelligent Medical Laboratory, Zhengzhou 450001, Henan, China
| | - Qinglei Zhou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, Henan, China
| |
Collapse
|
29
|
Li J, Chen X, Huang H, Zeng M, Yu J, Gong X, Ye Q. $\mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm. Brief Bioinform 2025; 26:bbaf120. [PMID: 40163822 PMCID: PMC11957296 DOI: 10.1093/bib/bbaf120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 01/23/2025] [Accepted: 02/23/2025] [Indexed: 04/02/2025] Open
Abstract
Protein pre-training has emerged as a transformative approach for solving diverse biological tasks. While many contemporary methods focus on sequence-based language models, recent findings highlight that protein sequences alone are insufficient to capture the extensive information inherent in protein structures. Recognizing the crucial role of protein structure in defining function and interactions, we introduce $\mathcal{S}$able, a versatile pre-training model designed to comprehensively understand protein structures. $\mathcal{S}$able incorporates a novel structural encoding mechanism that enhances inter-atomic information exchange and spatial awareness, combined with robust pre-training strategies and lightweight decoders optimized for specific downstream tasks. This approach enables $\mathcal{S}$able to consistently outperform existing methods in tasks such as generation, classification, and regression, demonstrating its superior capability in protein structure representation. The code and models can be accessed via GitHub repository at https://github.com/baaihealth/Sable.
Collapse
Affiliation(s)
- Jiashan Li
- Institute for Mathematical Sciences, Renmin University of China, 59 Zhongguancun Street, Beijing 100872, China
| | - Xi Chen
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - He Huang
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - Mingliang Zeng
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - Jingcheng Yu
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, 59 Zhongguancun Street, Beijing 100872, China
| | - Qiwei Ye
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| |
Collapse
|
30
|
Pompon D, Garcia-Alles LF, Urban P. Geometry-encoded molecular dynamics enables deep learning insights into P450 regiospecificity control. Sci Rep 2025; 15:7512. [PMID: 40032954 PMCID: PMC11876329 DOI: 10.1038/s41598-025-91155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 02/18/2025] [Indexed: 03/05/2025] Open
Abstract
Cytochrome P450 1A2, as many isoenzymes, can generate multiple metabolites from a single substrate. A loose coupling between substrate binding and oxygen activation makes possible substrate reorientations at the active site prior to catalysis. In the present work, caffeine oxidation to alternative bioactive compounds was used to decipher this pluripotency. A model involving two interacting subsites capable of sequentially accommodating one or two caffeine molecules was considered. Molecular dynamics was used to characterize subsite interactions and feed a dedicated geometric encoding of trajectories that was coupled to dimensional reductions and differential machine learning. The two subsites differentially control caffeine orientations and can exchange substrate through a phenylalanine gated mechanism. This exchange can be locked by the presence of a second bound molecule. Complementary roles of subsites in progressively determining the caffeine orientation during its approach to active oxygen were examined. Interestingly, substrate face flipping becomes impaired upon entry into the rather flat active site. This makes the mechanisms that define the orientation of caffeine relative to active oxygen dependent on the substrate face oriented toward heme. Globally, this evidenced that P450 1A2 regioselectivity results from local determinants combined with subsite interactions and caffeine face preselection at a longer distance.
Collapse
Affiliation(s)
- Denis Pompon
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, 135 Avenue de Rangueil, Toulouse, France.
| | - Luis F Garcia-Alles
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, 135 Avenue de Rangueil, Toulouse, France
| | - Philippe Urban
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, 135 Avenue de Rangueil, Toulouse, France
| |
Collapse
|
31
|
Medina-Ortiz D, Khalifeh A, Anvari-Kazemabad H, Davari MD. Interpretable and explainable predictive machine learning models for data-driven protein engineering. Biotechnol Adv 2025; 79:108495. [PMID: 39645211 DOI: 10.1016/j.biotechadv.2024.108495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 10/21/2024] [Accepted: 11/30/2024] [Indexed: 12/09/2024]
Abstract
Protein engineering through directed evolution and (semi)rational design has become a powerful approach for optimizing and enhancing proteins with desired properties. The integration of artificial intelligence methods has further accelerated protein engineering process by enabling the development of predictive models based on data-driven strategies. However, the lack of interpretability and transparency in these models limits their trustworthiness and applicability in real-world scenarios. Explainable Artificial Intelligence addresses these challenges by providing insights into the decision-making processes of machine learning models, enhancing their reliability and interpretability. Explainable strategies has been successfully applied in various biotechnology fields, including drug discovery, genomics, and medicine, yet its application in protein engineering remains underexplored. The incorporation of explainable strategies in protein engineering holds significant potential, as it can guide protein design by revealing how predictive models function, benefiting approaches such as machine learning-assisted directed evolution. This perspective work explores the principles and methodologies of explainable artificial intelligence, highlighting its relevance in biotechnology and its potential to enhance protein design. Additionally, three theoretical pipelines integrating predictive models with explainable strategies are proposed, focusing on their advantages, disadvantages, and technical requirements. Finally, the remaining challenges of explainable artificial intelligence in protein engineering and future directions for its development as a support tool for traditional protein engineering methodologies are discussed.
Collapse
Affiliation(s)
- David Medina-Ortiz
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany; Departamento de Ingeniería En Computación, Universidad de Magallanes, Avenida Bulnes, 01855, Punta Arenas, Chile.; Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, Santiago, Chile
| | - Ashkan Khalifeh
- Department of Mathematical and Physical Sciences, College of Arts and Sciences, University of Nizwa, Nizwa 616, Sultanate of Oman
| | - Hoda Anvari-Kazemabad
- Departamento de Ingeniería En Computación, Universidad de Magallanes, Avenida Bulnes, 01855, Punta Arenas, Chile
| | - Mehdi D Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany.
| |
Collapse
|
32
|
Marchand A, Buckley S, Schneuing A, Pacesa M, Elia M, Gainza P, Elizarova E, Neeser RM, Lee PW, Reymond L, Miao Y, Scheller L, Georgeon S, Schmidt J, Schwaller P, Maerkl SJ, Bronstein M, Correia BE. Targeting protein-ligand neosurfaces with a generalizable deep learning tool. Nature 2025; 639:522-531. [PMID: 39814890 PMCID: PMC11903328 DOI: 10.1038/s41586-024-08435-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 11/20/2024] [Indexed: 01/18/2025]
Abstract
Molecular recognition events between proteins drive biological processes in living systems1. However, higher levels of mechanistic regulation have emerged, in which protein-protein interactions are conditioned to small molecules2-5. Despite recent advances, computational tools for the design of new chemically induced protein interactions have remained a challenging task for the field6,7. Here we present a computational strategy for the design of proteins that target neosurfaces, that is, surfaces arising from protein-ligand complexes. To develop this strategy, we leveraged a geometric deep learning approach based on learned molecular surface representations8,9 and experimentally validated binders against three drug-bound protein complexes: Bcl2-venetoclax, DB3-progesterone and PDF1-actinonin. All binders demonstrated high affinities and accurate specificities, as assessed by mutational and structural characterization. Remarkably, surface fingerprints previously trained only on proteins could be applied to neosurfaces induced by interactions with small molecules, providing a powerful demonstration of generalizability that is uncommon in other deep learning approaches. We anticipate that such designed chemically induced protein interactions will have the potential to expand the sensing repertoire and the assembly of new synthetic pathways in engineered cells for innovative drug-controlled cell-based therapies10.
Collapse
Affiliation(s)
- Anthony Marchand
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Stephen Buckley
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Arne Schneuing
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Maddalena Elia
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Pablo Gainza
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
- Monte Rosa Therapeutics, Boston, MA, USA
| | - Evgenia Elizarova
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Rebecca M Neeser
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
- Laboratory of Chemical Artificial Intelligence, Institute of Chemical Sciences and Engineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Pao-Wan Lee
- Laboratory of Biological Network Characterization, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Luc Reymond
- Biomolecular Screening Core Facility, School of Life Sciences, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Yangyang Miao
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Leo Scheller
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Sandrine Georgeon
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Joseph Schmidt
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Philippe Schwaller
- Laboratory of Chemical Artificial Intelligence, Institute of Chemical Sciences and Engineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Sebastian J Maerkl
- Laboratory of Biological Network Characterization, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Michael Bronstein
- Department of Computer Science, University of Oxford, Oxford, UK
- Aithyra Research Institute for Biomedical Artificial Intelligence, Austrian Academy of Sciences, Vienna, Austria
| | - Bruno E Correia
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
33
|
Mou M, Zhang Z, Pan Z, Zhu F. Deep Learning for Predicting Biomolecular Binding Sites of Proteins. RESEARCH (WASHINGTON, D.C.) 2025; 8:0615. [PMID: 39995900 PMCID: PMC11848751 DOI: 10.34133/research.0615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2024] [Revised: 01/21/2025] [Accepted: 01/24/2025] [Indexed: 02/26/2025]
Abstract
The rapid evolution of deep learning has markedly enhanced protein-biomolecule binding site prediction, offering insights essential for drug discovery, mutation analysis, and molecular biology. Advancements in both sequence-based and structure-based methods demonstrate their distinct strengths and limitations. Sequence-based approaches offer efficiency and adaptability, while structure-based techniques provide spatial precision but require high-quality structural data. Emerging trends in hybrid models that combine multimodal data, such as integrating sequence and structural information, along with innovations in geometric deep learning, present promising directions for improving prediction accuracy. This perspective summarizes challenges such as computational demands and dynamic modeling and proposes strategies for future research. The ultimate goal is the development of computationally efficient and flexible models capable of capturing the complexity of real-world biomolecular interactions, thereby broadening the scope and applicability of binding site predictions across a wide range of biomedical contexts.
Collapse
Affiliation(s)
| | | | | | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
34
|
Mastrolorito F, Gambacorta N, Ciriaco F, Cutropia F, Togo MV, Belgiovine V, Tondo AR, Trisciuzzi D, Monaco A, Bellotti R, Altomare CD, Nicolotti O, Amoroso N. Chemical Space Networks Enhance Toxicity Recognition via Graph Embedding. J Chem Inf Model 2025; 65:1850-1861. [PMID: 39914823 DOI: 10.1021/acs.jcim.4c02140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Chemical space networks (CSNs) are a new effective strategy for detecting latent chemical patterns irrespective of defined coordinate systems based on molecular descriptors and fingerprints. CSNs can be a new powerful option as a new approach method and increase the capacity of assessing potential adverse impacts of chemicals on human health. Here, CSNs are shown to effectively characterize the toxicity of chemicals toward several human health end points, namely chromosomal aberrations, mutagenicity, carcinogenicity, developmental toxicity, skin irritation, estrogenicity, androgenicity, and hepatoxicity. In this work, we report how the content from CSNs structure can be embedded through graph neural networks into a metric space, which, for eight different toxicological human health end points, allows better discrimination of toxic and nontoxic chemicals. In fact, using embeddings returns, on average, an increase in predictive performances. In fact, embedding employment enhances the learning, leading to an increment of the classification performance of +12% in terms of the area under the ROC curve. Moreover, through a dedicated eXplainable Artificial Intelligence framework, a straight interpretation of results is provided through the detection of putative structural alerts related to a given toxicity. Hence, the proposed approach represents a step forward in the area of alternative methods and could lead to breakthrough innovations in the design of safer chemicals and drugs.
Collapse
Affiliation(s)
- F Mastrolorito
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - N Gambacorta
- Divisione di Genetica Medica, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo 71013, Italy
| | - F Ciriaco
- Dipartimento di Chimica, Universit̀a degli studi di Bari Aldo Moro, Bari 70121, Italy
| | - F Cutropia
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - Maria Vittoria Togo
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - V Belgiovine
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - A R Tondo
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - D Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - A Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125 Bari, Italy
- Dipartimento Interateneo di Fisica, Universit̀a degli studi di Bari Aldo Moro, Bari 70121, Italy
| | - R Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125 Bari, Italy
- Dipartimento Interateneo di Fisica, Universit̀a degli studi di Bari Aldo Moro, Bari 70121, Italy
| | - C D Altomare
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - O Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| | - N Amoroso
- Dipartimento di Farmacia-Scienze del Farmaco, Universit̀a degli studi di Bari Aldo Moro, Bari 70125, Italy
| |
Collapse
|
35
|
Kang C, Xu W. Leveraging Structural and Computational Biology for Molecular Glue Discovery. J Med Chem 2025; 68:2048-2051. [PMID: 39854250 DOI: 10.1021/acs.jmedchem.5c00076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2025]
Abstract
The discovery of molecular glues has made significant strides, unlocking new avenues for targeted protein degradation as a therapeutic strategy, thereby expanding the scope of drug discovery into territories previously considered undruggable. Pioneering molecules like thalidomide and its derivatives have paved the way for the development of small molecules that can induce specific protein degradation by hijacking the cellular ubiquitin-proteasome system. Recent advancements have focused on expanding the range of E3 ligases and target proteins that can be modulated by molecular glues. Structural elucidation of E3 ligase in complex with molecular glue and the target of interest, combined with computational modeling, facilitates the understanding of the underlying mechanisms of how molecular glues induce targeted degradation. By leveraging these tools, the next generation of molecular glues are expected to offer unprecedented opportunities for combating a wide range of diseases, including cancer, autoimmune disorders, and neurodegenerative conditions.
Collapse
Affiliation(s)
- Congbao Kang
- Experimental Drug Development Centre, Chromos, Agency for Science, Technology and Research, 10 Biopolis Road, #05-01, Singapore 138670
| | - Weijun Xu
- Experimental Drug Development Centre, Chromos, Agency for Science, Technology and Research, 10 Biopolis Road, #05-01, Singapore 138670
| |
Collapse
|
36
|
Ma X, Li F, Chen Q, Gao S, Bai F. NesT-NABind: a Nested Transformer for Nucleic Acid-Binding Site Prediction on Protein Surface. J Chem Inf Model 2025; 65:1166-1177. [PMID: 39818834 DOI: 10.1021/acs.jcim.4c01765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Protein-nucleic acid interactions play a crucial role in many physiological processes. Identifying the binding sites of nucleotides on the protein surface is the prerequisite for understanding the molecular recognition mechanisms between the two types of macromolecules and also provides the information to design or generate molecule modulators against these sites to manipulate biological function according to specific requirements. Existing studies mainly focus on characterizing local surfaces around sites, often neglecting the interrelationships among these sites and the global protein information. To address this gap, we propose NesT-NABind, a Nested Transformer for Nucleic Acid-Binding site prediction. This model leverages the Transformer's advanced capabilities in contextual understanding and long-range dependency capturing. Specifically, we introduce a local patch-scale Transformer to process surface information around each site and a global protein-scale transformer to integrate surface and sequence information on the entire protein. These two Transformers operate at different scales of protein, hence the term "nested". Experiments demonstrate that NesT-NABind achieves a 5.57% improvement in the F1 score and a 3.64% improvement in AUPRC compared to state-of-the-art methods. With the incorporation of global features, NesT-NABind shows an enhanced predictive capability for the challenging large proteins and therefore can be used in a much wider range of applications.
Collapse
Affiliation(s)
- Xinyue Ma
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
| | - Fenglei Li
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
- Department of Computer Science, Aalto University,Konemiehentie 2, Espoo02150,Finland
| | - Qianyu Chen
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
| | - Shenghua Gao
- Department of Computer Science, The University of Hong Kong, Pokfolam Road, HKSAR, 999077, China
- HKU Shanghai lntelligent Computing Research Center, Shanghai, 201210, China
| | - Fang Bai
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong New Area, Shanghai 201210, China
- School of Life Science and Technology, ShanghaiTech University, Pudong New Area, 393 Middle Huaxia Road, Shanghai 201210, China
- Shanghai Clinical Research and Trial Center, No.1599 Keyuan Road, Pudong New Area, Shanghai 201210, China
| |
Collapse
|
37
|
Li X, Loscalzo J, Mahmud AKMF, Aly DM, Rzhetsky A, Zitnik M, Benson M. Digital twins as global learning health and disease models for preventive and personalized medicine. Genome Med 2025; 17:11. [PMID: 39920778 PMCID: PMC11806862 DOI: 10.1186/s13073-025-01435-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 01/29/2025] [Indexed: 02/09/2025] Open
Abstract
Ineffective medication is a major healthcare problem causing significant patient suffering and economic costs. This issue stems from the complex nature of diseases, which involve altered interactions among thousands of genes across multiple cell types and organs. Disease progression can vary between patients and over time, influenced by genetic and environmental factors. To address this challenge, digital twins have emerged as a promising approach, which have led to international initiatives aiming at clinical implementations. Digital twins are virtual representations of health and disease processes that can integrate real-time data and simulations to predict, prevent, and personalize treatments. Early clinical applications of DTs have shown potential in areas like artificial organs, cancer, cardiology, and hospital workflow optimization. However, widespread implementation faces several challenges: (1) characterizing dynamic molecular changes across multiple biological scales; (2) developing computational methods to integrate data into DTs; (3) prioritizing disease mechanisms and therapeutic targets; (4) creating interoperable DT systems that can learn from each other; (5) designing user-friendly interfaces for patients and clinicians; (6) scaling DT technology globally for equitable healthcare access; (7) addressing ethical, regulatory, and financial considerations. Overcoming these hurdles could pave the way for more predictive, preventive, and personalized medicine, potentially transforming healthcare delivery and improving patient outcomes.
Collapse
Affiliation(s)
- Xinxiu Li
- Medical Digital Twin Research Group, Department of Clinical Sciences Intervention and Technology, Karolinska Institute, Stockholm, Sweden
| | - Joseph Loscalzo
- Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - A K M Firoj Mahmud
- Department of Medical Biochemistry and Microbiology, Uppsala University, 75105, Uppsala, Sweden
| | - Dina Mansour Aly
- Medical Digital Twin Research Group, Department of Clinical Sciences Intervention and Technology, Karolinska Institute, Stockholm, Sweden
| | - Andrey Rzhetsky
- Departments of Medicine and Human Genetics, Institute for Genomics and Systems Biology, University of Chicago, Chicago, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
| | - Mikael Benson
- Medical Digital Twin Research Group, Department of Clinical Sciences Intervention and Technology, Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
38
|
Papadopoulos AM, Axenopoulos A, Iatrou A, Stamatopoulos K, Alvarez F, Daras P. ParaSurf: a surface-based deep learning approach for paratope-antigen interaction prediction. Bioinformatics 2025; 41:btaf062. [PMID: 39921885 PMCID: PMC11855283 DOI: 10.1093/bioinformatics/btaf062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/14/2025] [Accepted: 02/03/2025] [Indexed: 02/10/2025] Open
Abstract
MOTIVATION Identifying antibody binding sites, is crucial for developing vaccines and therapeutic antibodies, processes that are time-consuming and costly. Accurate prediction of the paratope's binding site can speed up the development by improving our understanding of antibody-antigen interactions. RESULTS We present ParaSurf, a deep learning model that significantly enhances paratope prediction by incorporating both surface geometric and non-geometric factors. Trained and tested on three prominent antibody-antigen benchmarks, ParaSurf achieves state-of-the-art results across nearly all metrics. Unlike models restricted to the variable region, ParaSurf demonstrates the ability to accurately predict binding scores across the entire Fab region of the antibody. Additionally, we conducted an extensive analysis using the largest of the three datasets employed, focusing on three key components: (i) a detailed evaluation of paratope prediction for each complementarity-determining region loop, (ii) the performance of models trained exclusively on the heavy chain, and (iii) the results of training models solely on the light chain without incorporating data from the heavy chain. AVAILABILITY AND IMPLEMENTATION Source code for ParaSurf, along with the datasets used, preprocessing pipeline, and trained model weights, are freely available at https://github.com/aggelos-michael-papadopoulos/ParaSurf.
Collapse
Affiliation(s)
- Angelos-Michael Papadopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
- Universidad Politécnica de Madrid, Madrid 28040, Spain
| | - Apostolos Axenopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
- Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Anastasia Iatrou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| | - Kostas Stamatopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| | | | - Petros Daras
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| |
Collapse
|
39
|
Cao D, Chen M, Zhang R, Wang Z, Huang M, Yu J, Jiang X, Fan Z, Zhang W, Zhou H, Li X, Fu Z, Zhang S, Zheng M. SurfDock is a surface-informed diffusion generative model for reliable and accurate protein-ligand complex prediction. Nat Methods 2025; 22:310-322. [PMID: 39604569 DOI: 10.1038/s41592-024-02516-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 10/16/2024] [Indexed: 11/29/2024]
Abstract
Accurately predicting protein-ligand interactions is crucial for understanding cellular processes. We introduce SurfDock, a deep-learning method that addresses this challenge by integrating protein sequence, three-dimensional structural graphs and surface-level features into an equivariant architecture. SurfDock employs a generative diffusion model on a non-Euclidean manifold, optimizing molecular translations, rotations and torsions to generate reliable binding poses. Our extensive evaluations across various benchmarks demonstrate SurfDock's superiority over existing methods in docking success rates and adherence to physical constraints. It also exhibits remarkable generalizability to unseen proteins and predicted apo structures, while achieving state-of-the-art performance in virtual screening tasks. In a real-world application, SurfDock identified seven novel hit molecules in a virtual screening project targeting aldehyde dehydrogenase 1B1, a key enzyme in cellular metabolism. This showcases SurfDock's ability to elucidate molecular mechanisms underlying cellular processes. These results highlight SurfDock's potential as a transformative tool in structural biology, offering enhanced accuracy, physical plausibility and practical applicability in understanding protein-ligand interactions.
Collapse
Affiliation(s)
- Duanhua Cao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhaokun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Manlin Huang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Nanchang University, Nanchang, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Zhou
- Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
40
|
Zhai S, Liu T, Lin S, Li D, Liu H, Yao X, Hou T. Artificial intelligence in peptide-based drug design. Drug Discov Today 2025; 30:104300. [PMID: 39842504 DOI: 10.1016/j.drudis.2025.104300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 01/14/2025] [Accepted: 01/15/2025] [Indexed: 01/24/2025]
Abstract
Protein-protein interactions (PPIs) are fundamental to a variety of biological processes, but targeting them with small molecules is challenging because of their large and complex interaction interfaces. However, peptides have emerged as highly promising modulators of PPIs, because they can bind to protein surfaces with high affinity and specificity. Nonetheless, computational peptide design remains difficult, hindered by the intrinsic flexibility of peptides and the substantial computational resources required. Recent advances in artificial intelligence (AI) are paving new paths for peptide-based drug design. In this review, we explore the advanced deep generative models for designing target-specific peptide binders, highlight key challenges, and offer insights into the future direction of this rapidly evolving field.
Collapse
Affiliation(s)
- Silong Zhai
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tiantao Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Shaolong Lin
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Xiaojun Yao
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao.
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
41
|
Bian Q, Shen Z, Gao J, Shen L, Lu Y, Zhang Q, Chen R, Xu D, Liu T, Che J, Lu Y, Dong X. PPI-CoAttNet: A Web Server for Protein-Protein Interaction Tasks Using a Coattention Model. J Chem Inf Model 2025; 65:461-471. [PMID: 39761551 DOI: 10.1021/acs.jcim.4c01365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Predicting protein-protein interactions (PPIs) is crucial for advancing drug discovery. Despite the proposal of numerous advanced computational methods, these approaches often suffer from poor usability for biologists and lack generalization. In this study, we designed a deep learning model based on a coattention mechanism that was capable of both PPI and site prediction and used this model as the foundation for PPI-CoAttNet, a user-friendly, multifunctional web server for PPI prediction. This platform provides comprehensive services for online PPI model training, PPI and site prediction, and prediction of interactions with proteins associated with highly prevalent cancers. In our Homo sapiens test set for PPI prediction, PPI-CoAttNet achieved an AUC of 0.9841 and an F1 score of 0.9440, outperforming most state-of-the-art models. Additionally, these results are generated in real time, delivering outcomes within minutes. We also evaluated PPI-CoAttNet for downstream tasks, including novel E3 ligase scoring, demonstrating outstanding accuracy. We believe that this tool will empower researchers, especially those without computational expertise, to leverage AI for accelerating drug development.
Collapse
Affiliation(s)
- Qingyu Bian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zheyuan Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jian Gao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Liteng Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yang Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Qingnan Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Roufen Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Donghang Xu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Tao Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinxin Che
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yan Lu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
42
|
Shirali A, Stebliankin V, Karki U, Shi J, Chapagain P, Narasimhan G. A comprehensive survey of scoring functions for protein docking models. BMC Bioinformatics 2025; 26:25. [PMID: 39844036 PMCID: PMC11755896 DOI: 10.1186/s12859-024-05991-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 11/18/2024] [Indexed: 01/24/2025] Open
Abstract
BACKGROUND While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes. RESULTS In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications. CONCLUSIONS We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.
Collapse
Affiliation(s)
- Azam Shirali
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Vitalii Stebliankin
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Ukesh Karki
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Jimeng Shi
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Prem Chapagain
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA.
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA.
| |
Collapse
|
43
|
Majila K, Ullanat V, Viswanath S. A deep learning method for predicting interactions for intrinsically disordered regions of proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.19.629373. [PMID: 39763873 PMCID: PMC11702703 DOI: 10.1101/2024.12.19.629373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2025]
Abstract
Intrinsically disordered proteins or regions (IDPs/IDRs) adopt diverse binding modes with different partners, ranging from ordered to multivalent to fuzzy conformations in the bound state. Characterizing IDR interfaces is challenging experimentally and computationally. Alphafold-multimer and Alphafold3, the state-of-the-art structure prediction methods, are less accurate at predicting IDR binding sites at their benchmarked confidence cutoffs. Their performance improves upon lowering the confidence cutoffs. Here, we developed Disobind, a deep-learning method that predicts inter-protein contact maps and interface residues for an IDR and a partner protein, given their sequences. It outperforms AlphaFold-multimer and AlphaFold3 at multiple confidence cutoffs. Combining the Disobind and AlphaFold-multimer predictions further improves the performance. In contrast to most current methods, Disobind considers the context of the binding partner and does not depend on structures and multiple sequence alignments. Its predictions can be used to localize IDRs in integrative structures of large assemblies and characterize and modulate IDR-mediated interactions.
Collapse
Affiliation(s)
- Kartik Majila
- National Center for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India 560065
| | - Varun Ullanat
- National Center for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India 560065
| | - Shruthi Viswanath
- National Center for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India 560065
| |
Collapse
|
44
|
Zhang Y, Vitalis A. Benchmarking the robustness of the correct identification of flexible 3D objects using common machine learning models. PATTERNS (NEW YORK, N.Y.) 2025; 6:101147. [PMID: 39896260 PMCID: PMC11783895 DOI: 10.1016/j.patter.2024.101147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 09/09/2024] [Accepted: 12/10/2024] [Indexed: 02/04/2025]
Abstract
True three-dimensional (3D) data are prevalent in domains such as molecular science or computer vision. In these data, machine learning models are often asked to identify objects subject to intrinsic flexibility. Our study introduces two datasets from molecular science to assess the classification robustness of common model/feature combinations. Molecules are flexible, and shapes alone offer intra-class heterogeneities that yield a high risk for confusions. By blocking training and test sets to reduce overlap, we establish a baseline requiring the trained models to abstract from shape. As training data coverage grows, all tested architectures perform better on unseen data with reduced overfitting. Empirically, 2D embeddings of voxelized data produced the best-performing models. Evidently, both featurization and task-appropriate model design are of continued importance, the latter point reinforced by comparisons to recent, more specialized models. Finally, we show that the shape abstraction learned from database samples extends to samples that are evolving explicitly in time.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andreas Vitalis
- Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
45
|
Rodrigues CHM, Ascher DB. CSM-Potential2: A comprehensive deep learning platform for the analysis of protein interacting interfaces. Proteins 2025; 93:209-216. [PMID: 37870486 PMCID: PMC11623435 DOI: 10.1002/prot.26615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H. M. Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
| | - David B. Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
| |
Collapse
|
46
|
Li P, Liu ZP. Structure-Based Prediction of lncRNA-Protein Interactions by Deep Learning. Methods Mol Biol 2025; 2883:363-376. [PMID: 39702717 DOI: 10.1007/978-1-0716-4290-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
The interactions between long noncoding RNA (lncRNA) and protein play crucial roles in various biological processes. Computational methods are essential for predicting lncRNA-protein interactions and deciphering their mechanisms. In this chapter, we aim to introduce the fundamental framework for predicting lncRNA-protein interactions based on three-dimensional structure information. With the increasing availability of lncRNA and protein molecular tertiary structures, the feasibility of using deep learning methods for automatic representation and learning has become evident. This chapter outlines the key steps in predicting lncRNA-protein interactions using deep learning, including three common non-Euclidean data representations for lncRNA and proteins, as well as neural networks tailored to these specific data characteristics. We also highlight the advantages and challenges of structure-based prediction of lncRNA-protein interactions with geometric deep learning methods.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China.
| |
Collapse
|
47
|
Zhang K, Yang X, Wang Y, Yu Y, Huang N, Li G, Li X, Wu JC, Yang S. Artificial intelligence in drug development. Nat Med 2025; 31:45-59. [PMID: 39833407 DOI: 10.1038/s41591-024-03434-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 11/25/2024] [Indexed: 01/22/2025]
Abstract
Drug development is a complex and time-consuming endeavor that traditionally relies on the experience of drug developers and trial-and-error experimentation. The advent of artificial intelligence (AI) technologies, particularly emerging large language models and generative AI, is poised to redefine this paradigm. The integration of AI-driven methodologies into the drug development pipeline has already heralded subtle yet meaningful enhancements in both the efficiency and effectiveness of this process. Here we present an overview of recent advancements in AI applications across the entire drug development workflow, encompassing the identification of disease targets, drug discovery, preclinical and clinical studies, and post-market surveillance. Lastly, we critically examine the prevailing challenges to highlight promising future research directions in AI-augmented drug development.
Collapse
Affiliation(s)
- Kang Zhang
- Eye Hospital and Institute for Advanced Study on Eye Health and Diseases, Institute for clinical Data Science, Wenzhou Medical University, Wenzhou, China.
- State Key Laboratory of Macromolecular Drugs and Large-Scale Preparation, Wenzhou Medical University, Wenzhou, China.
| | - Xin Yang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yifei Wang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yunfang Yu
- Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
- Institute for AI in Medicine and faculty of Medicine, Macau University of Science and Technology, Macau, China
- Guangzhou National Laboratory, Guangzhou, China
| | - Niu Huang
- National Institute of Biological Sciences, Beijing, China
| | - Gen Li
- Eye Hospital and Institute for Advanced Study on Eye Health and Diseases, Institute for clinical Data Science, Wenzhou Medical University, Wenzhou, China
- Guangzhou National Laboratory, Guangzhou, China
- Eye and Vision Innovation Center, Eye Valley, Wenzhou, China
| | - Xiaokun Li
- State Key Laboratory of Macromolecular Drugs and Large-Scale Preparation, Wenzhou Medical University, Wenzhou, China
| | - Joseph C Wu
- Cardiovascular Research Institute, Stanford University, Stanford, CA, USA
| | - Shengyong Yang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
48
|
Charles S, Edgar MP. Geometric Deep learning Prioritization and Validation of Cannabis Phytochemicals as Anti-HCV Non-nucleoside Direct-acting Inhibitors. Biomed Eng Comput Biol 2024; 15:11795972241306881. [PMID: 39678171 PMCID: PMC11638990 DOI: 10.1177/11795972241306881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Accepted: 11/27/2024] [Indexed: 12/17/2024] Open
Abstract
Introduction The rate of acute hepatitis C increased by 7% between 2020 and 2021, after the number of cases doubled between 2014 and 2020. With the current adoption of pan-genotypic HCV therapy, there is a need for improved availability and accessibility of this therapy. However, double and triple DAA-resistant variants have been identified in genotypes 1 and 5 with resistance-associated amino acid substitutions (RAASs) in NS3/4A, NS5A, and NS5B. The role of this research was to screen for novel potential NS5B inhibitors from the cannabis compound database (CBD) using Deep Learning. Methods Virtual screening of the CBD compounds was performed using a trained Graph Neural Network (GNN) deep learning model. Re-docking and conventional docking were used to validate the results for these ligands since some had rotatable bonds >10. About 31 of the top 67 hits from virtual screening and docking were selected after ADMET screening. To verify their candidacy, 6 random hits were taken for FEP/MD and Molecular Simulation Dynamics to confirm their candidacy. Results The top 200 compounds from the deep learning virtual screening were selected, and the virtual screening results were validated by re-docking and conventional docking. The ADMET profiles were optimal for 31 hits. Simulated complexes indicate that these hits are likely inhibitors with suitable binding affinities and FEP energies. Phytil Diphosphate and glucaric acid were suggested as possible ligands against NS5B.
Collapse
Affiliation(s)
- Ssemuyiga Charles
- PharmaQsar Bioinformatics Firm, Kampala, Uganda
- Department of Microbiology, Kampala International University, School of Natural and Applied Sciences (SONAS), Kansanga, Kampala, Uganda
| | - Mulumba Pius Edgar
- PharmaQsar Bioinformatics Firm, Kampala, Uganda
- Department of Microbiology, Kampala International University, School of Natural and Applied Sciences (SONAS), Kansanga, Kampala, Uganda
| |
Collapse
|
49
|
Pacesa M, Nickel L, Schellhaas C, Schmidt J, Pyatova E, Kissling L, Barendse P, Choudhury J, Kapoor S, Alcaraz-Serna A, Cho Y, Ghamary KH, Vinué L, Yachnin BJ, Wollacott AM, Buckley S, Westphal AH, Lindhoud S, Georgeon S, Goverde CA, Hatzopoulos GN, Gönczy P, Muller YD, Schwank G, Swarts DC, Vecchio AJ, Schneider BL, Ovchinnikov S, Correia BE. BindCraft: one-shot design of functional protein binders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.30.615802. [PMID: 39677777 PMCID: PMC11642741 DOI: 10.1101/2024.09.30.615802] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Protein-protein interactions (PPIs) are at the core of all key biological processes. However, the complexity of the structural features that determine PPIs makes their design challenging. We present BindCraft, an open-source and automated pipeline for de novo protein binder design with experimental success rates of 10-100%. BindCraft leverages the weights of AlphaFold2 1 to generate binders with nanomolar affinity without the need for high-throughput screening or experimental optimization, even in the absence of known binding sites. We successfully designed binders against a diverse set of challenging targets, including cell-surface receptors, common allergens, de novo designed proteins, and multi-domain nucleases, such as CRISPR-Cas9. We showcase the functional and therapeutic potential of designed binders by reducing IgE binding to birch allergen in patient-derived samples, modulating Cas9 gene editing activity, and reducing the cytotoxicity of a foodborne bacterial enterotoxin. Lastly, we utilize cell surface receptor-specific binders to redirect AAV capsids for targeted gene delivery. This work represents a significant advancement towards a "one design-one binder" approach in computational design, with immense potential in therapeutics, diagnostics, and biotechnology.
Collapse
|
50
|
Han J, Zhang S, Guan M, Li Q, Gao X, Liu J. GeoNet enables the accurate prediction of protein-ligand binding sites through interpretable geometric deep learning. Structure 2024; 32:2435-2448.e5. [PMID: 39488202 DOI: 10.1016/j.str.2024.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 11/04/2024]
Abstract
The identification of protein binding residues is essential for understanding their functions in vivo. However, it remains a computational challenge to accurately identify binding sites due to the lack of known residue binding patterns. Local residue spatial distribution and its interactive biophysical environment both determine binding patterns. Previous methods could not capture both information simultaneously, resulting in unsatisfactory performance. Here, we present GeoNet, an interpretable geometric deep learning model for predicting DNA, RNA, and protein binding sites by learning the latent residue binding patterns. GeoNet achieves this by introducing a coordinate-free geometric representation to characterize local residue distributions and generating an eigenspace to depict local interactive biophysical environments. Evaluation shows that GeoNet is superior compared to other leading predictors and it shows a strong interpretability of learned representations. We present three test cases, where interaction interfaces were successfully identified with GeoNet.
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Mingming Guan
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Qiuyu Li
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China.
| |
Collapse
|