1
|
Xiao J, Hu G, Zhou X, Zheng Y, Li J. TIDGN: A Transfer Learning Framework for Predicting Interactions of Intrinsically Disordered Proteins with High Conformational Dynamics. J Chem Inf Model 2025; 65:4866-4877. [PMID: 40360271 DOI: 10.1021/acs.jcim.5c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2025]
Abstract
Interactions between intrinsically disordered proteins (IDPs) are crucial for biological processes, such as intracellular liquid-liquid phase separation (LLPS). Experiments (e.g., NMR) and simulations used to study IDP interactions encounter a variety of difficulties, highlighting the necessity to develop relevant machine learning methods. However, reliable machine learning methods face the challenge resulting from the scarcity of available training data. In this work, we propose a transfer learning-based invariant geometric dynamic graph model, named TIDGN, for predicting IDP interactions. The model consists of a pretraining task module and a downstream task module. The pretraining task module learns the dynamic structural encoding of IDP monomers, which is then used by the downstream task module for interaction site prediction. The IDP monomer structure data set and the IDP interaction event data set are constructed using all-atom molecular dynamics (MD) simulations. The transfer learning strategy effectively enhances the model's performance. Both homotypic interactions and heterotypic interactions between two IDPs are considered in this work. Interestingly, TIDGN performs well for the heterotypic interaction prediction. Additionally, the feature ablation analysis emphasizes the importance of invariant geometric graph features. Taken together, our work demonstrates that the integration of transfer learning and the invariant geometric graph network offers a promising approach for addressing data scarcity challenges of IDP interaction prediction.
Collapse
Affiliation(s)
- Jing Xiao
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guorong Hu
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Xiaozhou Zhou
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yuchuan Zheng
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Jingyuan Li
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| |
Collapse
|
2
|
Kurz NS, Kornrumpf K, Tucholski T, Drofenik K, König A, Beißbarth T, Dönitz J. Onkopus: precise interpretation and prioritization of sequence variants for biomedical research and precision medicine. Nucleic Acids Res 2025:gkaf376. [PMID: 40377094 DOI: 10.1093/nar/gkaf376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2025] [Revised: 04/14/2025] [Accepted: 04/25/2025] [Indexed: 05/18/2025] Open
Abstract
One of the major challenges in precision oncology is the identification of pathogenic, actionable variants and the selection of personalized treatments. We present Onkopus, a variant interpretation framework based on a modular architecture, for interpreting and prioritizing genetic alterations in cancer patients. A multitude of tools and databases are integrated into Onkopus to provide a comprehensive overview about the consequences of a variant, each with its own semantic, including pathogenicity predictions, allele frequency, biochemical and protein features, and therapeutic options. We present the characteristics of variants and personalized therapies in a clear and concise form, supported by interactive plots. To support the interpretation of variants of unknown significance (VUS), we present a protein analysis based on protein structures, which allows variants to be analyzed within the context of the entire protein, thereby serving as a starting point for understanding the underlying causes of variant pathogenicity. Onkopus has the potential to significantly enhance variant interpretation and the selection of actionable variants for identifying new targets, drug screens, drug testing using organoids, or personalized treatments in molecular tumor boards. We provide a free public instance of Onkopus at https://mtb.bioinf.med.uni-goettingen.de/onkopus.
Collapse
Affiliation(s)
- Nadine S Kurz
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Göttingen Comprehensive Cancer Center (G-CCC), 37075 Göttingen, Germany
| | - Kevin Kornrumpf
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
| | - Tim Tucholski
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Institute of Pathology, University Medical Center Göttingen , 37075 Göttingen, Germany
| | - Klara Drofenik
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Göttingen Comprehensive Cancer Center (G-CCC), 37075 Göttingen, Germany
| | - Alexander König
- Department of Gastroenterology, Gastrointestinal Oncology and Endocrinology, University Medical Center Göttingen, 37075 Göttingen, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Göttingen Comprehensive Cancer Center (G-CCC), 37075 Göttingen, Germany
- Campus Institute Data Science (CIDAS), Section Medical Data Science (MeDaS), 37077 Göttingen, Germany
| | - Jürgen Dönitz
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Göttingen Comprehensive Cancer Center (G-CCC), 37075 Göttingen, Germany
- Campus Institute Data Science (CIDAS), Section Medical Data Science (MeDaS), 37077 Göttingen, Germany
| |
Collapse
|
3
|
Qiao J, Jin J, Wang D, Teng S, Zhang J, Yang X, Liu Y, Wang Y, Cui L, Zou Q, Su R, Wei L. A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability. Nat Commun 2025; 16:4382. [PMID: 40355450 PMCID: PMC12069555 DOI: 10.1038/s41467-025-59634-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 04/25/2025] [Indexed: 05/14/2025] Open
Abstract
The major challenges in drug development stem from frequent structure-activity cliffs and unknown drug properties, which are expensive and time-consuming to estimate, contributing to a high rate of failures and substantial unavoidable costs in the clinical phases. Herein, we propose the self-conformation-aware graph transformer (SCAGE), an innovative deep learning architecture pretrained with approximately 5 million drug-like compounds for molecular property prediction. Notably, we develop a multitask pretraining framework, which incorporates four supervised and unsupervised tasks: molecular fingerprint prediction, functional group prediction using chemical prior information, 2D atomic distance prediction, and 3D bond angle prediction, covering aspects from molecular structures to functions. It enables learning comprehensive conformation-aware prior knowledge, thereby enhancing its generalization across various molecular property tasks. Moreover, we design a data-driven multiscale conformational learning strategy that effectively guides the model in understanding and representing atomic relationships at the molecular conformational scale. SCAGE achieves significant performance improvements across 9 molecular properties and 30 structure-activity cliff benchmarks. Case studies demonstrate that SCAGE accurately captures crucial functional groups at the atomic level, which are closely associated with molecular activity, providing valuable insights into quantitative structure-activity relationships.
Collapse
Affiliation(s)
- Jianbo Qiao
- School of Software, Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, China
| | - Saisai Teng
- School of Software, Shandong University, Jinan, China
| | - Junyu Zhang
- School of Software, Shandong University, Jinan, China
| | - Xuetong Yang
- School of Software, Shandong University, Jinan, China
| | - Yuhang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR), 999078, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, China
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR), 999078, China.
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
4
|
Xia R, Li W, Cheng Y, Xie L, Xu X. Molecular surfaces modeling: Advancements in deep learning for molecular interactions and predictions. Biochem Biophys Res Commun 2025; 763:151799. [PMID: 40239539 DOI: 10.1016/j.bbrc.2025.151799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/20/2025] [Accepted: 04/10/2025] [Indexed: 04/18/2025]
Abstract
Molecular surface analysis can provide a high-dimensional, rich representation of molecular properties and interactions, which is crucial for enabling powerful predictive modeling and rational molecular design across diverse scientific and technological domains. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in accelerating molecular discovery and innovation. The integration of AI techniques with molecular surface analysis has opened up new frontiers, allowing researchers to uncover hidden patterns, relationships, and design principles that were previously elusive. By leveraging the complementary strengths of molecular surface representations and advanced AI algorithms, scientists can now explore chemical space more efficiently, optimize molecular properties with greater precision, and drive transformative advancements in areas like drug development, materials engineering, and catalysis. In this review, we aim to provide an overview of recent advancements in the field of molecular surface analysis and its integration with AI techniques. These AI-driven approaches have led to significant advancements in various downstream tasks, including interface site prediction, protein-protein interaction prediction, surface-centric molecular generation and design.
Collapse
Affiliation(s)
- Renjie Xia
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Wei Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yi Cheng
- College of Engineering, Lishui University, Lishui, 323000, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China.
| |
Collapse
|
5
|
Banerjee A, Bogetti AT, Bahar I. Accurate identification and mechanistic evaluation of pathogenic missense variants with Rhapsody-2. Proc Natl Acad Sci U S A 2025; 122:e2418100122. [PMID: 40314982 PMCID: PMC12067267 DOI: 10.1073/pnas.2418100122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 04/06/2025] [Indexed: 05/03/2025] Open
Abstract
Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication and those distinguished by pronounced fluctuations in the high-frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY11794
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY11794
| | - Anthony T. Bogetti
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY11794
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY11794
| | - Ivet Bahar
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY11794
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY11794
| |
Collapse
|
6
|
Maity D, Qiao B. AlloBench: A Data Set Pipeline for the Development and Benchmarking of Allosteric Site Prediction Tools. ACS OMEGA 2025; 10:17973-17982. [PMID: 40352555 PMCID: PMC12059942 DOI: 10.1021/acsomega.5c01263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 04/14/2025] [Accepted: 04/17/2025] [Indexed: 05/14/2025]
Abstract
Allostery refers to the activity regulation of biological macromolecules originating from the binding of an effector molecule at the allosteric site that is distant from the active site. The few existing allosteric data sets have not been updated with recent discoveries of allosteric proteins and are challenging to use for data-intensive tasks. Instead of providing another data set bound to become outdated, we present the AlloBench pipeline to create high-quality data sets of biomolecules with allosteric and active site information suitable for computational and data-driven studies of protein allostery. The pipeline produces a data set of 2141 allosteric sites from 2034 protein structures with 418 unique protein chains by integrating information from AlloSteric Database, UniProt, Mechanism and Catalytic Site Atlas, and Protein Data Bank. Furthermore, we use a subset of 100 proteins from the AlloBench data set to quantitatively compare the performance of currently available allosteric site prediction tools: APOP, PASSer, Ohm, ALLO, Allosite, STRESS, and AlloPred. Such a large-scale benchmarking of these programs has not been undertaken on a common test set. The results show a significant need for improvement, as the accuracy for all programs is well below 60%, with PASSer (Ensemble) outperforming the rest. The AlloBench pipeline will not only promote the development of improved allosteric site prediction tools but also serve as a reference for studying allostery in general.
Collapse
Affiliation(s)
- Dibyajyoti Maity
- Department of Natural Sciences, Baruch College, City University of New York, New York 10010, New York United States
| | - Baofu Qiao
- Department of Natural Sciences, Baruch College, City University of New York, New York 10010, New York United States
| |
Collapse
|
7
|
Anteghini M, Gualdi F, Oliva B. How did we get there? AI applications to biological networks and sequences. Comput Biol Med 2025; 190:110064. [PMID: 40184941 DOI: 10.1016/j.compbiomed.2025.110064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/18/2025] [Accepted: 03/20/2025] [Indexed: 04/07/2025]
Abstract
The rapidly advancing field of artificial intelligence (AI) has transformed numerous scientific domains, including biology, where a vast and complex volume of data is available for analysis. This paper provides a comprehensive overview of the current state of AI-driven methodologies in genomics, proteomics, and systems biology. We discuss how machine learning algorithms, particularly deep learning models, have enhanced the accuracy and efficiency of embedding sequences, motif discovery, and the prediction of gene expression and protein structure. Additionally, we explore the integration of AI in the embedding and analysis of biological networks, including protein-protein interaction networks and multi-layered networks. By leveraging large-scale biological data, AI techniques have enabled unprecedented insights into complex biological processes and disease mechanisms. This work underlines the potential of applying AI to complex biological data, highlighting current applications and suggesting directions for future research to further explore AI in this rapidly evolving field.
Collapse
Affiliation(s)
- Marco Anteghini
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy; Visual and Data-Centric Computing, Zuse Institut Berlin, Berlin, Germany.
| | - Francesco Gualdi
- Structural Bioinformatics Lab, Universitat Pompeu Fabra, Barcelona, Spain; Istituto dalle Molle di Studi sull'Intelligenza Artificiale, USI/SUPSI (Università Svizzera Italiana/Scuola Universitaria Professionale Svizzera Italiana) Lugano, Switzerland.
| | - Baldo Oliva
- Structural Bioinformatics Lab, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
8
|
Chen L, Li Y, Ma Y, Gao L, Yu L. Multiscale graph equivariant diffusion model for 3D molecule design. SCIENCE ADVANCES 2025; 11:eadv0778. [PMID: 40238892 DOI: 10.1126/sciadv.adv0778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 03/07/2025] [Indexed: 04/18/2025]
Abstract
Three-dimensional molecular generation is critical in drug design. However, current methods often rely on point clouds or oversimplified interaction models, limiting their ability to accurately represent molecular structures. To address these challenges, this paper proposes the multiscale graph equivariant diffusion model for 3D molecule design (MD3MD). MD3MD partitions molecular conformations into multiscale graphs, assigning different weights to capture atomic interactions across scales. This framework guides the diffusion process, enabling high-quality 3D molecular generation. Experimental results demonstrate that MD3MD excels in both unconditional and conditional generation tasks, producing diverse, stable, and innovative molecules that meet specified conditions. Visualization highlights MD3MD's ability to learn domain-specific patterns and generate molecules distinct from existing datasets while maintaining distributional consistency. By effectively exploring chemical space, MD3MD surpasses previous methods in generating innovative and chemically diverse molecules, offering a notable advancement in the field of molecular design.
Collapse
Affiliation(s)
- Lu Chen
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Yan Li
- School of Management, Xi'an Polytechnic University, Xi'an 710000, Shaanxi, China
| | - Yanjie Ma
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| |
Collapse
|
9
|
Tahmid MT, Hasan AKMM, Bayzid MS. TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning. Commun Biol 2025; 8:568. [PMID: 40185915 PMCID: PMC11971327 DOI: 10.1038/s42003-025-07534-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 01/13/2025] [Indexed: 04/07/2025] Open
Abstract
Identifying DNA-binding proteins and their binding residues is critical for understanding diverse biological processes, but conventional experimental approaches are slow and costly. Existing machine learning methods, while faster, often lack accuracy and struggle with data imbalance, relying heavily on evolutionary profiles like PSSMs and HMMs derived from multiple sequence alignments (MSAs). These dependencies make them unsuitable for orphan proteins or those that evolve rapidly. To address these challenges, we introduce TransBind, an alignment-free deep learning framework that predicts DNA-binding proteins and residues directly from a single primary sequence, eliminating the need for MSAs. By leveraging features from pre-trained protein language models, TransBind effectively handles the issue of data imbalance and achieves superior performance. Extensive evaluations using diverse experimental datasets and case studies demonstrate that TransBind significantly outperforms state-of-the-art methods in terms of both accuracy and computational efficiency. TransBind is available as a web server at https://trans-bind-web-server-frontend.vercel.app/ .
Collapse
Affiliation(s)
- Md Toki Tahmid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - A K M Mehedi Hasan
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
| |
Collapse
|
10
|
Shen Y, Jiang Z, Liu R. Dynamic integration of feature- and template-based methods improves the prediction of conformational B cell epitopes. Structure 2025; 33:798-807.e4. [PMID: 39938510 DOI: 10.1016/j.str.2025.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/10/2024] [Accepted: 01/16/2025] [Indexed: 02/14/2025]
Abstract
The accurate prediction of conformational epitopes promotes our understanding of antigen-antibody interactions. All existing algorithms depend on a feature-based strategy, which limits their performance. A template-based strategy can provide complementary information, and the interplay between these two strategies could improve the prediction of epitopes. Here, we present DynaBCE, a dynamic ensemble algorithm to effectively identify conformational B cell epitopes (BCEs). Using novel handcrafted structural descriptors and embeddings from protein language models, we developed machine learning and deep learning modules based on boosting algorithms and geometric graph neural networks, respectively. Furthermore, we built a template module by leveraging known structural template information and transformer-based algorithms to capture binding signatures. Finally, we integrated the three modules using a dynamic weighting approach to maximize the strength of each module for different samples. DynaBCE achieved promising results for both native and predicted structures and outperformed previous methods as demonstrated in various evaluation scenarios.
Collapse
Affiliation(s)
- Yueyue Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Zheng Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China.
| |
Collapse
|
11
|
Moth CW, Sheehan JH, Mamun AA, Sivley RM, Gulsevin A, Rinker D, Undiagnosed Diseases Network, Capra JA, Meiler J. VUStruct: a compute pipeline for high throughput and personalized structural biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.06.606224. [PMID: 39149406 PMCID: PMC11326201 DOI: 10.1101/2024.08.06.606224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Effective diagnosis and treatment of rare genetic disorders requires the interpretation of a patient's genetic variants of unknown significance (VUSs). Today, clinical decision-making is primarily guided by gene-phenotype association databases and DNA-based scoring methods. Our web-accessible variant analysis pipeline, VUStruct, supplements these established approaches by deeply analyzing the downstream molecular impact of variation in context of 3D protein structure. VUStruct's growing impact is fueled by the co-proliferation of protein 3D structural models, gene sequencing, compute power, and artificial intelligence. Contextualizing VUSs in protein 3D structural models also illuminates longitudinal genomics studies and biochemical bench research focused on VUS, and we created VUStruct for clinicians and researchers alike. We now introduce VUStruct to the broad scientific community as a mature, web-facing, extensible, High-Performance Computing (HPC) software pipeline. VUStruct maps missense variants onto automatically selected protein structures and launches a broad range of analyses. These include energy-based assessments of protein folding and stability, pathogenicity prediction through spatial clustering analysis, and machine learning (ML) predictors of binding surface disruptions and nearby post-translational modification sites. The pipeline also considers the entire input set of VUS and identifies genes potentially involved in digenic disease. VUStruct's utility in clinical rare disease genome interpretation has been demonstrated through its analysis of over 175 Undiagnosed Disease Network (UDN) Patient cases. VUStruct-leveraged hypotheses have often informed clinicians in their consideration of additional patient testing, and we report here details from two cases where VUStruct was key to their solution. We also note successes with academic research collaborators, for whom VUStruct has informed research directions in both computational genomics and wet lab studies.
Collapse
Affiliation(s)
- Christopher W. Moth
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
| | - Jonathan H. Sheehan
- Division of Infection Diseases, Milliken Dept. of Internal Medicine, Washington Univ. of Medicine in St. Louis, MO 63110, USA
| | - Abdullah Al Mamun
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
| | | | - Alican Gulsevin
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - David Rinker
- Department of Biological Sciences, Evolutionary Studies Initiative; Vanderbilt Univ., Nashville, TN 37232, USA
| | | | - John A. Capra
- Bakar Computational Health Science Institute and Department of Epidemiology and Biostatistics, Univ. of California San Francisco, CA 94143, USA
| | - Jens Meiler
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
- Leipzig University Medical School, Institute for Drug Discovery, Brüderstraße 34, 04103 Leipzig, Germany
| |
Collapse
|
12
|
Zhang C, Sun Y, Hu P. An interpretable deep geometric learning model to predict the effects of mutations on protein-protein interactions using large-scale protein language model. J Cheminform 2025; 17:35. [PMID: 40119464 PMCID: PMC11927297 DOI: 10.1186/s13321-025-00979-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 02/27/2025] [Indexed: 03/24/2025] Open
Abstract
Protein-protein interactions (PPIs) are central to the mechanisms of signaling pathways and immune responses, which can help us understand disease etiology. Therefore, there is a significant need for efficient and rapid automated approaches to predict changes in PPIs. In recent years, there has been a significant increase in applying deep learning techniques to predict changes in binding affinity between the original protein complex and its mutant variants. Particularly, the adoption of graph neural networks (GNNs) has gained prominence for their ability to learn representations of protein-protein complexes. However, the conventional GNNs have mainly concentrated on capturing local features, often disregarding the interactions among distant elements that hold potential important information. In this study, we have developed a transformer-based graph neural network to extract features of the mutant segment from the three-dimensional structure of protein-protein complexes. By embracing both local and global features, the approach ensures a more comprehensive understanding of the intricate relationships, thus promising more accurate predictions of binding affinity changes. To enhance the representation capability of protein features, we incorporate a large-scale pre-trained protein language model into our approach and employ the global protein feature it provides. The proposed model is shown to be able to predict the mutation changes in binding affinity with a root mean square error of 1.10 and a Pearson correlation coefficient of near 0.71, as demonstrated by performance on test and validation cases. Our experiments on all five datasets, including both single mutant and multiple mutant cases, demonstrate that our model outperforms four state-of-the-art baseline methods, and the efficacy was subjected to comprehensive experimental evaluation. Our study introduces a transformer-based graph neural network approach to accurately predict changes in protein-protein interactions (PPIs). By integrating local and global features and leveraging pretrained protein language models, our model outperforms state-of-the-art methods across diverse datasets. The results of this study can provide new views for studying immune responses and disease etiology related to protein mutations. Furthermore, this approach may contribute to other biological or biochemical studies related to PPIs.Scientific contribution Our scientific contribution lies in the development of a novel transformer-based graph neural network tailored to predict changes in protein-protein interactions (PPIs) with excellent accuracy. By seamlessly integrating both local and global features extracted from the three-dimensional structure of protein-protein complexes, and leveraging the rich representations provided by pretrained protein language models, our approach surpasses existing methods across diverse datasets. Our findings may offer novel insights for the understanding of complex disease etiology associated with protein mutations. The novel tool can be applicable to various biological and biochemical investigations involving protein mutations.
Collapse
Affiliation(s)
- Caiya Zhang
- Department of Computer Science, Western University, London, ON, Canada
| | - Yan Sun
- Department of Computer Science, Western University, London, ON, Canada
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
- Department of Biochemistry, Western University, London, ON, Canada
| | - Pingzhao Hu
- Department of Computer Science, Western University, London, ON, Canada.
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada.
- Department of Biochemistry, Western University, London, ON, Canada.
- Department of Oncology, Western University, London, ON, Canada.
- Department of Epidemiology and Biostatistics, Western University, London, ON, Canada.
- The Children's Health Research Institute, Lawson Health Research Institute, London, ON, Canada.
| |
Collapse
|
13
|
Meng Z, Li J, Wang H, Cao Z, Lu W, Niu X, Yang Y, Li Z, Wang Y, Lu S. NLRP4 unlocks an NK/macrophages-centered ecosystem to suppress non-small cell lung cancer. Biomark Res 2025; 13:44. [PMID: 40087771 PMCID: PMC11909883 DOI: 10.1186/s40364-025-00756-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 03/03/2025] [Indexed: 03/17/2025] Open
Abstract
BACKGROUND Tumor immune evasion extends beyond T cells, affecting innate immune elements like natural killer cells (NK) and macrophages within the tumor-immune microenvironment (TIME). Nevertheless, translational strategies to trigger collaboration of NK cells and macrophages to initiate sufficient anti-tumor cytoxicity remain scarce and are urgently needed. METHODS In this study, TCGA datasets was used to confirm the prognosis value of the expression level of NLR family pyrin domain containing 4 (NLRP4) in NSCLC and the tumor tissues microarray was used to further check its clinical-relevance at protein-level. Subsequently, a tumor cell line with stable NLRP4 overexpression was established and subcutaneous tumor models in C57BL/6J mice were used to validate the anti-tumor characteristics of NLRP4. After analyzing the tumor microenvironment using flow cytometry and multiplex immunofluorescence, we further validated our findings through co-culture transwell assays and TCGA analysis. Utilizing bulk-RNA sequencing, proteomics, and mass spectrometry of mouse tumor tissues, we innovatively identified the downstream pathways of NLRP4 and verified them through co-immunoprecipitation (co-IP) and Western blot (WB) experiments. RESULTS NLRP4 could trigger a distinct anti-tumor ecosystem organized by TIGIT+TNFA+ NK and iNOS+ M1 in lung cancer, discovered in TCGA analysis and verified in murine model. NLRP4-eco exerted tumor-suppression capacity through chemokine reprogramming including CCL5 and CXCL2. Meanwhile, the cytoxicity of NK could be facilitated by iNOS+M1. Mechanistically, NLRP4 stimulated PI3K/Akt-NF-kB axis through suppression of the activity of PP2A. Besides, knockdown of CCL5 and blockade of CXCL2-CXCR2 axis abolished chemotaxis of TIGIT+TNFA+ NK and iNOS+ M1 respectively, as well as for LB-100, a PP2A inhibitor. CONCLUSION Altogether, we delineated NLRP4's unexplored facets and discovered an NLRP4-driven anti-tumor ecosystem composed of TIGIT+TNFA+ NK and iNOS+ M1. Finally, targeting PP2A by its inhibitor successfully mimicked the anti-tumor capacity of the overexpression of NLRP4.
Collapse
Affiliation(s)
- Zhouwenli Meng
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Jian Li
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Hui Wang
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Zhengqi Cao
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Wenqing Lu
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Xiaomin Niu
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Yi Yang
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Ziming Li
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China.
| | - Ying Wang
- Shanghai Institute of Immunology, Department of Immunology and Microbiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, P. R. China.
| | - Shun Lu
- Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China.
| |
Collapse
|
14
|
Iqbal Z, Asim M, Khan UA, Sultan N, Ali I. Computational electrostatic engineering of nanobodies for enhanced SARS-CoV-2 receptor binding domain recognition. Front Mol Biosci 2025; 12:1512788. [PMID: 40129869 PMCID: PMC11931142 DOI: 10.3389/fmolb.2025.1512788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 02/11/2025] [Indexed: 03/26/2025] Open
Abstract
This study presents a novel computational approach for engineering nanobodies (Nbs) for improved interaction with receptor-binding domain (RBD) of the SARS-CoV-2 spike protein. Using Protein Structure Reliability reports, RBD (7VYR_R) was selected and refined for subsequent Nb-RBD interactions. By leveraging electrostatic complementarity (EC) analysis, we engineered and characterized five Electrostatically Complementary Nbs (ECSb1-ECSb5) based on the CeVICA library's SR6c3 Nb. Through targeted modifications in the complementarity-determining regions (CDR) and framework regions (FR), we optimized electrostatic interactions to improve binding affinity and specificity. The engineered Nbs (ECSb3, ECSb4, and ECSb5) demonstrated high binding specificity for AS3, CA1, and CA2 epitopes. Interestingly, ECSb1 and ECSb2 selectively engaged with AS3 and CA1 instead of AS1 and AS2, respectively, due to a preference for residues that conferred superior binding complementarities. Furthermore, ECSbs significantly outperformed SR6c3 Nb in MM/GBSA results, notably, ECSb4 and ECSb3 exhibited superior binding free energies of -182.58 kcal.mol-1 and -119.07 kcal.mol-1, respectively, compared to SR6c3 (-105.50 kcal.mol-1). ECSbs exhibited significantly higher thermostability (100.4-148.3 kcal·mol⁻1) compared to SR6c3 (62.6 kcal·mol⁻1). Similarly, enhanced electrostatic complementarity was also observed for ECSb4-RBD and ECSb3-RBD (0.305 and 0.390, respectively) relative to SR6c3-RBD (0.233). Surface analyses confirmed optimized electrostatic patches and reduced aggregation propensity in the engineered Nb. This integrated EC and structural engineering approach successfully developed engineered Nbs with enhanced binding specificity, increased thermostability, and reduced aggregation, laying the groundwork for novel therapeutic applications targeting the SARS-CoV-2 spike protein.
Collapse
Affiliation(s)
- Zafar Iqbal
- Central Laboratories, King Faisal University, Al Hofuf, Saudi Arabia
| | - Muhammad Asim
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| | - Umair Ahmad Khan
- Medical and Allied Department, Faisalabad Medical University, Faisalabad, Pakistan
| | - Neelam Sultan
- Department of Biochemistry, Government College University Faisalabad, Faisalabad, Pakistan
| | - Irfan Ali
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| |
Collapse
|
15
|
Banerjee A, Bogetti A, Bahar I. Accurate Identification and Mechanistic Evaluation of Pathogenic Missense Variants with Rhapsody-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.17.638727. [PMID: 40027614 PMCID: PMC11870481 DOI: 10.1101/2025.02.17.638727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication, and those distinguished by pronounced fluctuations in the high frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, New York 11794, USA
| | - Anthony Bogetti
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, New York 11794, USA
| | - Ivet Bahar
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, New York 11794, USA
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, New York 11794, USA
| |
Collapse
|
16
|
Cao D, Chen M, Zhang R, Wang Z, Huang M, Yu J, Jiang X, Fan Z, Zhang W, Zhou H, Li X, Fu Z, Zhang S, Zheng M. SurfDock is a surface-informed diffusion generative model for reliable and accurate protein-ligand complex prediction. Nat Methods 2025; 22:310-322. [PMID: 39604569 DOI: 10.1038/s41592-024-02516-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 10/16/2024] [Indexed: 11/29/2024]
Abstract
Accurately predicting protein-ligand interactions is crucial for understanding cellular processes. We introduce SurfDock, a deep-learning method that addresses this challenge by integrating protein sequence, three-dimensional structural graphs and surface-level features into an equivariant architecture. SurfDock employs a generative diffusion model on a non-Euclidean manifold, optimizing molecular translations, rotations and torsions to generate reliable binding poses. Our extensive evaluations across various benchmarks demonstrate SurfDock's superiority over existing methods in docking success rates and adherence to physical constraints. It also exhibits remarkable generalizability to unseen proteins and predicted apo structures, while achieving state-of-the-art performance in virtual screening tasks. In a real-world application, SurfDock identified seven novel hit molecules in a virtual screening project targeting aldehyde dehydrogenase 1B1, a key enzyme in cellular metabolism. This showcases SurfDock's ability to elucidate molecular mechanisms underlying cellular processes. These results highlight SurfDock's potential as a transformative tool in structural biology, offering enhanced accuracy, physical plausibility and practical applicability in understanding protein-ligand interactions.
Collapse
Affiliation(s)
- Duanhua Cao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhaokun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Manlin Huang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Nanchang University, Nanchang, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Zhou
- Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
17
|
Yang Z, Helmann T, Baudin M, Schreiber KJ, Bao Z, Stodghill P, Deutschbauer A, Lewis JD, Swingle B. Genome-wide identification of novel flagellar motility genes in Pseudomonas syringae pv. tomato DC3000. Front Microbiol 2025; 16:1535114. [PMID: 39935648 PMCID: PMC11813219 DOI: 10.3389/fmicb.2025.1535114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 01/06/2025] [Indexed: 02/13/2025] Open
Abstract
Pseudomonas syringae pv. tomato DC3000 (Pst DC3000) is a plant pathogenic bacterium that possesses complicated motility regulation pathways including a typical chemotaxis system. A significant portion of our understanding about the genes functioning in Pst DC3000 motility is based on comparison to other bacteria. This leaves uncertainty about whether gene functions are conserved, especially since specific regulatory modules can have opposite functions in sets of Pseudomonas. In this study, we used a competitive selection to enrich for mutants with altered swimming motility and used random barcode transposon-site sequencing (RB-TnSeq) to identify genes with significant roles in swimming motility. Besides many of the known or predicted chemotaxis and motility genes, our method identified PSPTO_0406 (dipA), PSPTO_1042 (chrR) and PSPTO_4229 (hypothetical protein) as novel motility regulators. PSPTO_0406 is a homolog of dipA, a known cyclic di-GMP degrading enzyme in P. aeruginosa. PSPTO_1042 is part of an extracytoplasmic sensing system that controls gene expression in response to reactive oxygen species, suggesting that PSPTO_1042 may function as part of a mechanism that enables Pst DC3000 to alter motility when encountering oxidative stressors. PSPTO_4229 encodes a protein containing an HD-related output domain (HDOD), but with no previously identified functions. We found that deletion and overexpression of PSPTO_4229 both reduce swimming motility, suggesting that its function is sensitive to expression level. We used the overexpression phenotype to screen for nonsense and missense mutants of PSPTO_4229 that no longer reduce swimming motility and found a pair of conserved arginine residues that are necessary for motility suppression. Together these results provide a global perspective on regulatory and structural genes controlling flagellar motility in Pst DC3000.
Collapse
Affiliation(s)
- Zichu Yang
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Tyler Helmann
- Emerging Pests and Pathogens Research Unit, Robert W. Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, United States
| | - Maël Baudin
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, United States
- Plant Gene Expression Center, United States Department of Agriculture-Agricultural Research Service, Berkeley, CA, United States
- Institut Agro, INRAE, IRHS, SFR QUASAV, Université Angers, Angers, France
| | - Karl J. Schreiber
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, United States
- Plant Gene Expression Center, United States Department of Agriculture-Agricultural Research Service, Berkeley, CA, United States
| | - Zhongmeng Bao
- Emerging Pests and Pathogens Research Unit, Robert W. Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, United States
| | - Paul Stodghill
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
- Emerging Pests and Pathogens Research Unit, Robert W. Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, United States
| | - Adam Deutschbauer
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Jennifer D. Lewis
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, United States
- Plant Gene Expression Center, United States Department of Agriculture-Agricultural Research Service, Berkeley, CA, United States
| | - Bryan Swingle
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
- Emerging Pests and Pathogens Research Unit, Robert W. Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, United States
| |
Collapse
|
18
|
Pitarch B, Pazos F. Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules 2025; 30:214. [PMID: 39860084 PMCID: PMC11767512 DOI: 10.3390/molecules30020214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/20/2024] [Accepted: 01/01/2025] [Indexed: 01/27/2025] Open
Abstract
Knowing which residues of a protein are important for its function is of paramount importance for understanding the molecular basis of this function and devising ways of modifying it for medical or biotechnological applications. Due to the difficulty in detecting these residues experimentally, prediction methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. Deep learning approaches are especially well suited for this task due to the large amounts of protein sequences for training them, the trivial codification of this sequence data to feed into these systems, and the intrinsic sequential nature of the data that makes them suitable for language models. As a consequence, deep learning-based approaches are being applied to the prediction of different types of functional sites and regions in proteins. This review aims to give an overview of the current landscape of methodologies so that interested users can have an idea of which kind of approaches are available for their proteins of interest. We also try to give an idea of how these systems work, as well as explain their limitations and high dependence on the training set so that users are aware of the quality of expected results.
Collapse
Affiliation(s)
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), 28049 Madrid, Spain;
| |
Collapse
|
19
|
Li P, Liu ZP. Structure-Based Prediction of lncRNA-Protein Interactions by Deep Learning. Methods Mol Biol 2025; 2883:363-376. [PMID: 39702717 DOI: 10.1007/978-1-0716-4290-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
The interactions between long noncoding RNA (lncRNA) and protein play crucial roles in various biological processes. Computational methods are essential for predicting lncRNA-protein interactions and deciphering their mechanisms. In this chapter, we aim to introduce the fundamental framework for predicting lncRNA-protein interactions based on three-dimensional structure information. With the increasing availability of lncRNA and protein molecular tertiary structures, the feasibility of using deep learning methods for automatic representation and learning has become evident. This chapter outlines the key steps in predicting lncRNA-protein interactions using deep learning, including three common non-Euclidean data representations for lncRNA and proteins, as well as neural networks tailored to these specific data characteristics. We also highlight the advantages and challenges of structure-based prediction of lncRNA-protein interactions with geometric deep learning methods.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China.
| |
Collapse
|
20
|
Nada H, Choi Y, Kim S, Jeong KS, Meanwell NA, Lee K. New insights into protein-protein interaction modulators in drug discovery and therapeutic advance. Signal Transduct Target Ther 2024; 9:341. [PMID: 39638817 PMCID: PMC11621763 DOI: 10.1038/s41392-024-02036-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 09/09/2024] [Accepted: 10/23/2024] [Indexed: 12/07/2024] Open
Abstract
Protein-protein interactions (PPIs) are fundamental to cellular signaling and transduction which marks them as attractive therapeutic drug development targets. What were once considered to be undruggable targets have become increasingly feasible due to the progress that has been made over the last two decades and the rapid technological advances. This work explores the influence of technological innovations on PPI research and development. Additionally, the diverse strategies for discovering, modulating, and characterizing PPIs and their corresponding modulators are examined with the aim of presenting a streamlined pipeline for advancing PPI-targeted therapeutics. By showcasing carefully selected case studies in PPI modulator discovery and development, we aim to illustrate the efficacy of various strategies for identifying, optimizing, and overcoming challenges associated with PPI modulator design. The valuable lessons and insights gained from the identification, optimization, and approval of PPI modulators are discussed with the aim of demonstrating that PPI modulators have transitioned beyond early-stage drug discovery and now represent a prime opportunity with significant potential. The selected examples of PPI modulators encompass those developed for cancer, inflammation and immunomodulation, as well as antiviral applications. This perspective aims to establish a foundation for the effective targeting and modulation of PPIs using PPI modulators and pave the way for future drug development.
Collapse
Affiliation(s)
- Hossam Nada
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, Republic of Korea
- Department of Radiology, Molecular Imaging Innovations Institute (MI3), Weill Cornell Medicine, New York, USA
| | - Yongseok Choi
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Sungdo Kim
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Kwon Su Jeong
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, Republic of Korea
| | - Nicholas A Meanwell
- Baruch S. Blumberg Institute, Doylestown, PA, USA
- School of Pharmacy, University of Michigan, Ann Arbor, MI, USA
- Ernest Mario School of Pharmacy, Rutgers University New Brunswick, New Brunswick, NJ, USA
| | - Kyeong Lee
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, Republic of Korea.
| |
Collapse
|
21
|
Michalewicz K, Barahona M, Bravi B. ANTIPASTI: Interpretable prediction of antibody binding affinity exploiting normal modes and deep learning. Structure 2024; 32:2422-2434.e5. [PMID: 39461331 DOI: 10.1016/j.str.2024.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 05/30/2024] [Accepted: 10/01/2024] [Indexed: 10/29/2024]
Abstract
The high binding affinity of antibodies toward their cognate targets is key to eliciting effective immune responses, as well as to the use of antibodies as research and therapeutic tools. Here, we propose ANTIPASTI, a convolutional neural network model that achieves state-of-the-art performance in the prediction of antibody binding affinity using as input a representation of antibody-antigen structures in terms of normal mode correlation maps derived from elastic network models. This representation captures not only structural features but energetic patterns of local and global residue fluctuations. The learnt representations are interpretable: they reveal similarities of binding patterns among antibodies targeting the same antigen type, and can be used to quantify the importance of antibody regions contributing to binding affinity. Our results show the importance of the antigen imprint in the normal mode landscape, and the dominance of cooperative effects and long-range correlations between antibody regions to determine binding affinity.
Collapse
Affiliation(s)
- Kevin Michalewicz
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK.
| | - Mauricio Barahona
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Barbara Bravi
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
22
|
Han J, Zhang S, Guan M, Li Q, Gao X, Liu J. GeoNet enables the accurate prediction of protein-ligand binding sites through interpretable geometric deep learning. Structure 2024; 32:2435-2448.e5. [PMID: 39488202 DOI: 10.1016/j.str.2024.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 11/04/2024]
Abstract
The identification of protein binding residues is essential for understanding their functions in vivo. However, it remains a computational challenge to accurately identify binding sites due to the lack of known residue binding patterns. Local residue spatial distribution and its interactive biophysical environment both determine binding patterns. Previous methods could not capture both information simultaneously, resulting in unsatisfactory performance. Here, we present GeoNet, an interpretable geometric deep learning model for predicting DNA, RNA, and protein binding sites by learning the latent residue binding patterns. GeoNet achieves this by introducing a coordinate-free geometric representation to characterize local residue distributions and generating an eigenspace to depict local interactive biophysical environments. Evaluation shows that GeoNet is superior compared to other leading predictors and it shows a strong interpretability of learned representations. We present three test cases, where interaction interfaces were successfully identified with GeoNet.
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Mingming Guan
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Qiuyu Li
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China.
| |
Collapse
|
23
|
Cecil AJ, Sogues A, Gurumurthi M, Lane KS, Remaut H, Pak AJ. Molecular dynamics and machine learning stratify motion-dependent activity profiles of S-layer destabilizing nanobodies. PNAS NEXUS 2024; 3:pgae538. [PMID: 39660065 PMCID: PMC11631148 DOI: 10.1093/pnasnexus/pgae538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 11/04/2024] [Indexed: 12/12/2024]
Abstract
Nanobody (Nb)-induced disassembly of surface array protein (Sap) S-layers, a two-dimensional paracrystalline protein lattice from Bacillus anthracis, has been presented as a therapeutic intervention for lethal anthrax infections. However, only a subset of existing Nbs with affinity to Sap exhibit depolymerization activity, suggesting that affinity and epitope recognition are not enough to explain inhibitory activity. In this study, we performed all-atom molecular dynamics simulations of each Nb bound to the Sap binding site and trained a collection of machine learning classifiers to predict whether each Nb induces depolymerization. We used feature importance analysis to filter out unnecessary features and engineered remaining features to regularize the feature landscape and encourage learning of the depolymerization mechanism. We find that, while not enforced in training, a gradient-boosting decision tree is able to reproduce the experimental activities of inhibitory Nbs while maintaining high classification accuracy, whereas neural networks were only able to discriminate between classes. Further feature analysis revealed that inhibitory Nbs restrain Sap motions toward an inhibitory conformational state described by domain-domain clamping and induced twisting of domains normal to the lattice plane. We believe these motions drive Sap lattice depolymerization and can be used as design targets for improved Sap-inhibitory Nbs. Finally, we expect our method of study to apply to S-layers that serve as virulence factors in other pathogens, paving the way forward for Nb therapeutics that target depolymerization mechanisms.
Collapse
Affiliation(s)
- Adam J Cecil
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | - Adrià Sogues
- Structural and Molecular Microbiology, VIB-VUB Center for Structural Biology, Pleinlaan 2, 1050 Brussels, Belgium
- Structural Biology, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Mukund Gurumurthi
- Quantitative Biosciences and Engineering Program, Colorado School of Mines, Golden, CO 80401, USA
| | - Kaylee S Lane
- Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| | - Han Remaut
- Structural and Molecular Microbiology, VIB-VUB Center for Structural Biology, Pleinlaan 2, 1050 Brussels, Belgium
- Structural Biology, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Alexander J Pak
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, CO 80401, USA
- Quantitative Biosciences and Engineering Program, Colorado School of Mines, Golden, CO 80401, USA
- Materials Science Program, Colorado School of Mines, Golden, CO 80401, USA
| |
Collapse
|
24
|
Lange SM, McFarland MR, Lamoliatte F, Carroll T, Krshnan L, Pérez-Ràfols A, Kwasna D, Shen L, Wallace I, Cole I, Armstrong LA, Knebel A, Johnson C, De Cesare V, Kulathu Y. VCP/p97-associated proteins are binders and debranching enzymes of K48-K63-branched ubiquitin chains. Nat Struct Mol Biol 2024; 31:1872-1887. [PMID: 38977901 PMCID: PMC11638074 DOI: 10.1038/s41594-024-01354-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/13/2024] [Indexed: 07/10/2024]
Abstract
Branched ubiquitin (Ub) chains constitute a sizable fraction of Ub polymers in human cells. Despite their abundance, our understanding of branched Ub function in cell signaling has been stunted by the absence of accessible methods and tools. Here we identify cellular branched-chain-specific binding proteins and devise approaches to probe K48-K63-branched Ub function. We establish a method to monitor cleavage of linkages within complex Ub chains and unveil ATXN3 and MINDY as debranching enzymes. We engineer a K48-K63 branch-specific nanobody and reveal the molecular basis of its specificity in crystal structures of nanobody-branched Ub chain complexes. Using this nanobody, we detect increased K48-K63-Ub branching following valosin-containing protein (VCP)/p97 inhibition and after DNA damage. Together with our discovery that multiple VCP/p97-associated proteins bind to or debranch K48-K63-linked Ub, these results suggest a function for K48-K63-branched chains in VCP/p97-related processes.
Collapse
Affiliation(s)
- Sven M Lange
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK.
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA.
| | - Matthew R McFarland
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Frederic Lamoliatte
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Thomas Carroll
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Logesvaran Krshnan
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Anna Pérez-Ràfols
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Dominika Kwasna
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
- Malopolska Centre of Biotechnology (MCB), Jagiellonian University, Krakow, Poland
| | - Linnan Shen
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Iona Wallace
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Isobel Cole
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Lee A Armstrong
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Axel Knebel
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Clare Johnson
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Virginia De Cesare
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK
| | - Yogesh Kulathu
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee, UK.
| |
Collapse
|
25
|
Wang X, Gao X, Fan X, Huai Z, Zhang G, Yao M, Wang T, Huang X, Lai L. WUREN: Whole-modal union representation for epitope prediction. Comput Struct Biotechnol J 2024; 23:2122-2131. [PMID: 38817963 PMCID: PMC11137340 DOI: 10.1016/j.csbj.2024.05.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/14/2024] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
B-cell epitope identification plays a vital role in the development of vaccines, therapies, and diagnostic tools. Currently, molecular docking tools in B-cell epitope prediction are heavily influenced by empirical parameters and require significant computational resources, rendering a great challenge to meet large-scale prediction demands. When predicting epitopes from antigen-antibody complex, current artificial intelligence algorithms cannot accurately implement the prediction due to insufficient protein feature representations, indicating novel algorithm is desperately needed for efficient protein information extraction. In this paper, we introduce a multimodal model called WUREN (Whole-modal Union Representation for Epitope predictioN), which effectively combines sequence, graph, and structural features. It achieved AUC-PR scores of 0.213 and 0.193 on the solved structures and AlphaFold-generated structures, respectively, for the independent test proteins selected from DiscoTope3 benchmark. Our findings indicate that WUREN is an efficient feature extraction model for protein complexes, with the generalizable application potential in the development of protein-based drugs. Moreover, the streamlined framework of WUREN could be readily extended to model similar biomolecules, such as nucleic acids, carbohydrates, and lipids.
Collapse
Affiliation(s)
| | | | - Xuezhe Fan
- XtalPi Innovation Center, Beijing, China
| | - Zhe Huai
- XtalPi Innovation Center, Beijing, China
| | | | | | | | | | - Lipeng Lai
- XtalPi Innovation Center, Beijing, China
| |
Collapse
|
26
|
Madsen AV, Mejias-Gomez O, Pedersen LE, Preben Morth J, Kristensen P, Jenkins TP, Goletz S. Structural trends in antibody-antigen binding interfaces: a computational analysis of 1833 experimentally determined 3D structures. Comput Struct Biotechnol J 2024; 23:199-211. [PMID: 38161735 PMCID: PMC10755492 DOI: 10.1016/j.csbj.2023.11.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 11/27/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open
Abstract
Antibodies are attractive therapeutic candidates due to their ability to bind cognate antigens with high affinity and specificity. Still, the underlying molecular rules governing the antibody-antigen interface remain poorly understood, making in silico antibody design inherently difficult and keeping the discovery and design of novel antibodies a costly and laborious process. This study investigates the characteristics of antibody-antigen binding interfaces through a computational analysis of more than 850,000 atom-atom contacts from the largest reported set of antibody-antigen complexes with 1833 nonredundant, experimentally determined structures. The analysis compares binding characteristics of conventional antibodies and single-domain antibodies (sdAbs) targeting both protein- and peptide antigens. We find clear patterns in the number antibody-antigen contacts and amino acid frequencies in the paratope. The direct comparison of sdAbs and conventional antibodies helps elucidate the mechanisms employed by sdAbs to compensate for their smaller size and the fact that they harbor only half the number of complementarity-determining regions compared to conventional antibodies. Furthermore, we pinpoint antibody interface hotspot residues that are often found at the binding interface and the amino acid frequencies at these positions. These findings have direct potential applications in antibody engineering and the design of improved antibody libraries.
Collapse
Affiliation(s)
- Andreas V. Madsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Oscar Mejias-Gomez
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Lasse E. Pedersen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - J. Preben Morth
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Peter Kristensen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Timothy P. Jenkins
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Steffen Goletz
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
27
|
Blaabjerg LM, Jonsson N, Boomsma W, Stein A, Lindorff-Larsen K. SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions. Nat Commun 2024; 15:9646. [PMID: 39511177 PMCID: PMC11544099 DOI: 10.1038/s41467-024-53982-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/28/2024] [Indexed: 11/15/2024] Open
Abstract
The ability to predict how amino acid changes affect proteins has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from sequence and structure in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments. We show that by integrating both types of information we obtain a variant effect prediction model that is robust when sequence information is scarce. We also show that SSEmb learns embeddings of the sequence and structure that are useful for other downstream tasks such as to predict protein-protein binding sites. We envisage that SSEmb may be useful both for variant effect predictions and as a representation for learning to predict protein properties that depend on sequence and structure.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of Copenhagen, Copenhagen N, Denmark.
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| |
Collapse
|
28
|
Wang J, Liu Y, Tian B. Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning. J Cheminform 2024; 16:125. [PMID: 39506806 PMCID: PMC11542454 DOI: 10.1186/s13321-024-00920-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 10/20/2024] [Indexed: 11/08/2024] Open
Abstract
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.529. We also compiled the UniProtSMB dataset, which merges sites from similar proteins based on raw data from UniProtKB database, and achieved an MCC of 0.699 on the test set. In addition, CLAPE-SMB achieved an MCC of 0.815 on our intrinsically disordered protein (IDP) dataset that contains 336 non-redundant sequences. Case studies of DAPK1, RebH, and Nep1 support the potential of this binding site prediction tool to aid in drug design. The code and datasets are freely available at https://github.com/JueWangTHU/CLAPE-SMB . SCIENTIFIC CONTRIBUTION: CLAPE-SMB combines a pre-trained protein language model with contrastive learning to accurately predict protein-small molecule binding sites, especially for proteins without experimental structures, such as IDPs. Trained across various datasets, this model shows strong adaptability, making it a valuable tool for advancing drug design and understanding protein-small molecule interactions.
Collapse
Affiliation(s)
- Jue Wang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Yufan Liu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
29
|
Pihlajamäki A, Matus MF, Malola S, Häkkinen H. GraphBNC: Machine Learning-Aided Prediction of Interactions Between Metal Nanoclusters and Blood Proteins. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2407046. [PMID: 39318073 PMCID: PMC11586822 DOI: 10.1002/adma.202407046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/13/2024] [Indexed: 09/26/2024]
Abstract
Hybrid nanostructures between biomolecules and inorganic nanomaterials constitute a largely unexplored field of research, with the potential for novel applications in bioimaging, biosensing, and nanomedicine. Developing such applications relies critically on understanding the dynamical properties of the nano-bio interface. This work introduces and validates a strategy to predict atom-scale interactions between water-soluble gold nanoclusters (AuNCs) and a set of blood proteins (albumin, apolipoprotein, immunoglobulin, and fibrinogen). Graph theory and neural networks are utilized to predict the strengths of interactions in AuNC-protein complexes on a coarse-grained level, which are then optimized in Monte Carlo-based structure search and refined to atomic-scale structures. The training data is based on extensive molecular dynamics (MD) simulations of AuNC-protein complexes, and the validating MD simulations show the robustness of the predictions. This strategy can be generalized to any complexes of inorganic nanostructures and biomolecules provided that one generates enough data about the interactions, and the bioactive parts of the nanostructure can be coarse-grained rationally.
Collapse
Affiliation(s)
- Antti Pihlajamäki
- Department of PhysicsNanoscience CenterUniversity of JyväskyläJyväskyläFI‐40014Finland
| | - María Francisca Matus
- Department of PhysicsNanoscience CenterUniversity of JyväskyläJyväskyläFI‐40014Finland
| | - Sami Malola
- Department of PhysicsNanoscience CenterUniversity of JyväskyläJyväskyläFI‐40014Finland
| | - Hannu Häkkinen
- Department of PhysicsNanoscience CenterUniversity of JyväskyläJyväskyläFI‐40014Finland
- Department of ChemistryNanoscience CenterUniversity of JyväskyläJyväskyläFI‐40014Finland
| |
Collapse
|
30
|
Zhong J, Zhao H, Zhao Q, Zhou R, Zhang L, Guo F, Wang J. RGCNPPIS: A Residual Graph Convolutional Network for Protein-Protein Interaction Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1676-1684. [PMID: 38843057 DOI: 10.1109/tcbb.2024.3410350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Accurate identification of protein-protein interaction (PPI) sites is crucial for understanding the mechanisms of biological processes, developing PPI networks, and detecting protein functions. Currently, most computational methods primarily concentrate on sequence context features and rarely consider the spatial neighborhood features. To address this limitation, we propose a novel residual graph convolutional network for structure-based PPI site prediction (RGCNPPIS). Specifically, we use a GCN module to extract the global structural features from all spatial neighborhoods, and utilize the GraphSage module to extract local structural features from local spatial neighborhoods. To the best of our knowledge, this is the first work utilizing local structural features for PPI site prediction. We also propose an enhanced residual graph connection to combine the initial node representation, local structural features, and the previous GCN layer's node representation, which enables information transfer between layers and alleviates the over-smoothing problem. Evaluation results demonstrate that RGCNPPIS outperforms state-of-the-art methods on three independent test sets. In addition, the results of ablation experiments and case studies confirm that RGCNPPIS is an effective tool for PPI site prediction.
Collapse
|
31
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. A structurally informed human protein-protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 2024:10.1038/s41587-024-02428-4. [PMID: 39448882 DOI: 10.1038/s41587-024-02428-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/11/2024] [Indexed: 10/26/2024]
Abstract
To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Grants
- R01GM124559 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM125639 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM130885 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- RM1GM139738 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01DK115398 U.S. Department of Health & Human Services | NIH | National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases)
- U01HG007691 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R01HL155107 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL155096 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL166137 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U54HL119145 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- AHA957729 American Heart Association (American Heart Association, Inc.)
- 24MERIT1185447 American Heart Association (American Heart Association, Inc.)
- R01AG084250 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R56AG074001 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- U01AG073323 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG066707 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG076448 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG082118 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1AG082211 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R21AG083003 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1NS133812 U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Yunguang Qiu
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY, USA
| | - Yadi Zhou
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
- Biophysics Program, Cornell University, Ithaca, NY, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Charis Eng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA.
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA.
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
32
|
An W, Li T, Tian X, Fu X, Li C, Wang Z, Wang J, Wang X. Allergies to Allergens from Cats and Dogs: A Review and Update on Sources, Pathogenesis, and Strategies. Int J Mol Sci 2024; 25:10520. [PMID: 39408849 PMCID: PMC11476515 DOI: 10.3390/ijms251910520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 09/26/2024] [Accepted: 09/27/2024] [Indexed: 10/20/2024] Open
Abstract
Inhalation allergies caused by cats and dogs can lead to a range of discomforting symptoms, such as rhinitis and asthma, in humans. With the increasing popularity of and care provided to these companion animals, the allergens they produce pose a growing threat to susceptible patients' health. Allergens from cats and dogs have emerged as significant risk factors for triggering asthma and allergic rhinitis worldwide; however, there remains a lack of systematic measures aimed at assisting individuals in recognizing and preventing allergies caused by these animals. This review provides comprehensive insights into the classification of cat and dog allergens, along with their pathogenic mechanisms. This study also discusses implementation strategies for prevention and control measures, including physical methods, gene-editing technology, and immunological approaches, as well as potential strategies for enhancing allergen immunotherapy combined with immunoinformatics. Finally, it presents future prospects for the prevention and treatment of human allergies caused by cats and dogs. This review will improve knowledge regarding allergies to cats and dogs while providing insights into potential targets for the development of next-generation treatments.
Collapse
Affiliation(s)
- Wei An
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| | - Ting Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, No. 20, Dongda Street, Beijing 100071, China;
| | - Xinya Tian
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| | - Xiaoxin Fu
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| | - Chunxiao Li
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| | - Zhenlong Wang
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| | - Jinquan Wang
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| | - Xiumin Wang
- Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (W.A.); (X.T.); (X.F.); (C.L.); (Z.W.)
- Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
| |
Collapse
|
33
|
Li Y, Nan X, Zhang S, Zhou Q, Lu S, Tian Z. PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy. Biomolecules 2024; 14:1220. [PMID: 39456153 PMCID: PMC11506650 DOI: 10.3390/biom14101220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/28/2024] Open
Abstract
Proteins perform different biological functions through binding with various molecules which are mediated by a few key residues and accurate prediction of such protein binding residues (PBRs) is crucial for understanding cellular processes and for designing new drugs. Many computational prediction approaches have been proposed to identify PBRs with sequence-based features. However, these approaches face two main challenges: (1) these methods only concatenate residue feature vectors with a simple sliding window strategy, and (2) it is challenging to find a uniform sliding window size suitable for learning embeddings across different types of PBRs. In this study, we propose one novel framework that could apply multiple types of PBRs Prediciton task through Multi-scale Sequence-based Feature Fusion (PMSFF) strategy. Firstly, PMSFF employs a pre-trained language model named ProtT5, to encode amino acid residues in protein sequences. Then, it generates multi-scale residue embeddings by applying multi-size windows to capture effective neighboring residues and multi-size kernels to learn information across different scales. Additionally, the proposed model treats protein sequences as sentences, employing a bidirectional GRU to learn global context. We also collect benchmark datasets encompassing various PBRs types and evaluate our PMSFF approach to these datasets. Compared with state-of-the-art methods, PMSFF demonstrates superior performance on most PBRs prediction tasks.
Collapse
Affiliation(s)
- Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Longhu Laboratory of Advanced Immunology, Zhengzhou 450001, China
| | - Qinglei Zhou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
34
|
Rismani E, Mafakher L, Asgari M, Raz A. Leech, potato, and tomato carboxypeptidase inhibitors against Anopheles stephensi carboxypeptidase B1 and B2. Arch Biochem Biophys 2024; 759:110086. [PMID: 38972626 DOI: 10.1016/j.abb.2024.110086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/16/2024] [Accepted: 07/05/2024] [Indexed: 07/09/2024]
Abstract
Carboxypeptidase B (CPB) in Anopheles spp. breaks down blood and releases free amino acids, which promote Plasmodium sexual development in the mosquito midgut. Our goal was to computationally assess the inhibitory effectiveness of carboxypeptidase inhibitors obtained from tomato, potato (CPiSt), and leech against the Anopheles stephensi CPBAs1 and CPBAs2 enzymes. The tertiary structures of CPB inhibitors were predicted and their interaction mode with CPBAs1 and CPBAs2 were examined using molecular docking. Next, this data was compared with four licensed medications that are known to reduce the Anopheles' CPB activity. Molecular dynamics simulations were used to evaluate the stability of complexes containing CPiSt and its mutant form. Both CPiSt and its mutant form showed promise as possible candidates for further evaluations in the paratransgenesis technique for malaria control, based on the similar bindings of CPiSt and CPiSt-Mut to the active sites of CPBAs1 and CPBAs2, as well as their binding affinity in comparison to the drugs.
Collapse
Affiliation(s)
- Elham Rismani
- Molecular Medicine Department, Biotechnology Research Center (BRC), Pasteur Institute of Iran, Tehran, Iran
| | - Ladan Mafakher
- Thalassemia & Hemoglobinopathy Research Center, Health Research Institute, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Majid Asgari
- Malaria and Vector Research Group (MVRG), Biotechnology Research Center (BRC), Pasteur Institute of Iran, Tehran, Iran
| | - Abbasali Raz
- Malaria and Vector Research Group (MVRG), Biotechnology Research Center (BRC), Pasteur Institute of Iran, Tehran, Iran.
| |
Collapse
|
35
|
Li P, Liu ZP. MuToN Quantifies Binding Affinity Changes upon Protein Mutations by Geometric Deep Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2402918. [PMID: 38995072 PMCID: PMC11425207 DOI: 10.1002/advs.202402918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/04/2024] [Indexed: 07/13/2024]
Abstract
Assessing changes in protein-protein binding affinity due to mutations helps understanding a wide range of crucial biological processes within cells. Despite significant efforts to create accurate computational models, predicting how mutations affect affinity remains challenging due to the complexity of the biological mechanisms involved. In the present work, a geometric deep learning framework called MuToN is introduced for quantifying protein binding affinity change upon residue mutations. The method, designed with geometric attention networks, is mechanism-aware. It captures changes in the protein binding interfaces of mutated complexes and assesses the allosteric effects of amino acids. Experimental results highlight MuToN's superiority compared to existing methods. Additionally, MuToN's flexibility and effectiveness are illustrated by its precise predictions of binding affinity changes between SARS-CoV-2 variants and the ACE2 complex.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| |
Collapse
|
36
|
Carroll M, Rosenbaum E, Viswanathan R. Computational Methods to Predict Conformational B-Cell Epitopes. Biomolecules 2024; 14:983. [PMID: 39199371 PMCID: PMC11352882 DOI: 10.3390/biom14080983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/04/2024] [Accepted: 08/08/2024] [Indexed: 09/01/2024] Open
Abstract
Accurate computational prediction of B-cell epitopes can greatly enhance biomedical research and rapidly advance efforts to develop therapeutics, monoclonal antibodies, vaccines, and immunodiagnostic reagents. Previous research efforts have primarily focused on the development of computational methods to predict linear epitopes rather than conformational epitopes; however, the latter is much more biologically predominant. Several conformational B-cell epitope prediction methods have recently been published, but their predictive performances are weak. Here, we present a review of the latest computational methods and assess their performances on a diverse test set of 29 non-redundant unbound antigen structures. Our results demonstrate that ISPIPab performs better than most methods and compares favorably with other recent antigen-specific methods. Finally, we suggest new strategies and opportunities to improve computational predictions of conformational B-cell epitopes.
Collapse
Affiliation(s)
| | | | - R. Viswanathan
- Department of Chemistry and Biochemistry, Yeshiva College, Yeshiva University, New York, NY 10033, USA; (M.C.); (E.R.)
| |
Collapse
|
37
|
Pandi B, Brenman S, Black A, Ng DCM, Lau E, Lam MPY. Tissue Usage Preference and Intrinsically Disordered Region Remodeling of Alternative Splicing Derived Proteoforms in the Heart. J Proteome Res 2024; 23:3161-3173. [PMID: 38456420 PMCID: PMC11296937 DOI: 10.1021/acs.jproteome.3c00789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/08/2024] [Accepted: 02/27/2024] [Indexed: 03/09/2024]
Abstract
A computational analysis of mass spectrometry data was performed to uncover alternative splicing derived protein variants across chambers of the human heart. Evidence for 216 non-canonical isoforms was apparent in the atrium and the ventricle, including 52 isoforms not documented on SwissProt and recovered using an RNA sequencing derived database. Among non-canonical isoforms, 29 show signs of regulation based on statistically significant preferences in tissue usage, including a ventricular enriched protein isoform of tensin-1 (TNS1) and an atrium-enriched PDZ and LIM Domain 3 (PDLIM3) isoform 2 (PDLIM3-2/ALP-H). Examined variant regions that differ between alternative and canonical isoforms are highly enriched with intrinsically disordered regions. Moreover, over two-thirds of such regions are predicted to function in protein binding and RNA binding. The analysis here lends further credence to the notion that alternative splicing diversifies the proteome by rewiring intrinsically disordered regions, which are increasingly recognized to play important roles in the generation of biological function from protein sequences.
Collapse
Affiliation(s)
- Boomathi Pandi
- Department
of Medicine/Division of Cardiology, Department of Biochemistry &
Molecular Genetics, and Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, Colorado 80045, United States
| | - Stella Brenman
- Department
of Medicine/Division of Cardiology, Department of Biochemistry &
Molecular Genetics, and Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, Colorado 80045, United States
| | - Alexander Black
- Department
of Medicine/Division of Cardiology, Department of Biochemistry &
Molecular Genetics, and Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, Colorado 80045, United States
| | - Dominic C. M. Ng
- Department
of Medicine/Division of Cardiology, Department of Biochemistry &
Molecular Genetics, and Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, Colorado 80045, United States
| | - Edward Lau
- Department
of Medicine/Division of Cardiology, Department of Biochemistry &
Molecular Genetics, and Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, Colorado 80045, United States
| | - Maggie P. Y. Lam
- Department
of Medicine/Division of Cardiology, Department of Biochemistry &
Molecular Genetics, and Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, Colorado 80045, United States
| |
Collapse
|
38
|
Zhai K, Dong J, Zeng J, Cheng P, Wu X, Han W, Chen Y, Qiu Z, Zhou Y, Pu J, Jiang T, Du X. Global antigenic landscape and vaccine recommendation strategy for low pathogenic avian influenza A (H9N2) viruses. J Infect 2024; 89:106199. [PMID: 38901571 DOI: 10.1016/j.jinf.2024.106199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 06/09/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024]
Abstract
The sustained circulation of H9N2 avian influenza viruses (AIVs) poses a significant threat for contributing to a new pandemic. Given the temporal and spatial uncertainty in the antigenicity of H9N2 AIVs, the immune protection efficiency of vaccines remains challenging. By developing an antigenicity prediction method for H9N2 AIVs, named PREDAC-H9, the global antigenic landscape of H9N2 AIVs was mapped. PREDAC-H9 utilizes the XGBoost model with 14 well-designed features. The XGBoost model was built and evaluated to predict the antigenic relationship between any two viruses with high values of 81.1 %, 81.4 %, 81.3 %, 81.1 %, and 89.4 % in accuracy, precision, recall, F1 value, and area under curve (AUC), respectively. Then the antigenic correlation network (ACnet) was constructed based on the predicted antigenic relationship for H9N2 AIVs from 1966 to 2022, and ten major antigenic clusters were identified. Of these, four novel clusters were generated in China in the past decade, demonstrating the unique complex situation there. To help tackle this situation, we applied PREDAC-H9 to calculate the cluster-transition determining sites and screen out virus strains with the high cross-protective spectrum, thus providing an in silico reference for vaccine recommendation. The proposed model will reduce the clinical monitoring workload and provide a useful tool for surveillance and control of H9N2 AIVs.
Collapse
Affiliation(s)
- Ke Zhai
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China
| | - Jinze Dong
- National Key Laboratory of Veterinary Public Health and Safety, Key Laboratory for Prevention and Control of Avian Influenza and Other Major Poultry Diseases, Ministry of Agriculture and Rural Affairs, College of Veterinary Medicine, China Agricultural University, Beijing 100193, PR China
| | - Jinfeng Zeng
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China
| | - Peiwen Cheng
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China
| | - Xinsheng Wu
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China
| | - Wenjie Han
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China
| | - Yilin Chen
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China
| | - Zekai Qiu
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China; Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany; Medical Faculty Heidelberg, Heidelberg University, Heidelberg 69047, Germany
| | - Yong Zhou
- National Key Laboratory of Veterinary Public Health and Safety, Key Laboratory for Prevention and Control of Avian Influenza and Other Major Poultry Diseases, Ministry of Agriculture and Rural Affairs, College of Veterinary Medicine, China Agricultural University, Beijing 100193, PR China
| | - Juan Pu
- National Key Laboratory of Veterinary Public Health and Safety, Key Laboratory for Prevention and Control of Avian Influenza and Other Major Poultry Diseases, Ministry of Agriculture and Rural Affairs, College of Veterinary Medicine, China Agricultural University, Beijing 100193, PR China.
| | - Taijiao Jiang
- Guangzhou National Laboratory, Guangzhou 510005, PR China; State Key Laboratory of Respiratory Disease, The Key Laboratory of Advanced Interdisciplinary Studies Center, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, PR China; Suzhou Institute of Systems Medicine, Suzhou 215123, PR China.
| | - Xiangjun Du
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou 510275, PR China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China; Shenzhen Key Laboratory of Pathogenic Microbes & Biosecurity, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, PR China; Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou 510030, PR China.
| |
Collapse
|
39
|
Yuan Q, Tian C, Song Y, Ou P, Zhu M, Zhao H, Yang Y. GPSFun: geometry-aware protein sequence function predictions with language models. Nucleic Acids Res 2024; 52:W248-W255. [PMID: 38738636 PMCID: PMC11223820 DOI: 10.1093/nar/gkae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/22/2024] [Accepted: 04/26/2024] [Indexed: 05/14/2024] Open
Abstract
Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yidong Song
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Peihua Ou
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Mingming Zhu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| |
Collapse
|
40
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
41
|
Pegoraro M, Dominé C, Rodolà E, Veličković P, Deac A. Geometric epitope and paratope prediction. Bioinformatics 2024; 40:btae405. [PMID: 38984742 PMCID: PMC11245313 DOI: 10.1093/bioinformatics/btae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/14/2024] [Accepted: 07/09/2024] [Indexed: 07/11/2024] Open
Abstract
MOTIVATION Identifying the binding sites of antibodies is essential for developing vaccines and synthetic antibodies. In this article, we investigate the optimal representation for predicting the binding sites in the two molecules and emphasize the importance of geometric information. RESULTS Specifically, we compare different geometric deep learning methods applied to proteins' inner (I-GEP) and outer (O-GEP) structures. We incorporate 3D coordinates and spectral geometric descriptors as input features to fully leverage the geometric information. Our research suggests that different geometrical representation information is useful for different tasks. Surface-based models are more efficient in predicting the binding of the epitope, while graph models are better in paratope prediction, both achieving significant performance improvements. Moreover, we analyze the impact of structural changes in antibodies and antigens resulting from conformational rearrangements or reconstruction errors. Through this investigation, we showcase the robustness of geometric deep learning methods and spectral geometric descriptors to such perturbations. AVAILABILITY AND IMPLEMENTATION The python code for the models, together with the data and the processing pipeline, is open-source and available at https://github.com/Marco-Peg/GEP.
Collapse
Affiliation(s)
- Marco Pegoraro
- Department of Computer Science, Sapienza University of Rome, 00185, Italy
| | - Clémentine Dominé
- Gatsby Computational Neuroscience Unit, University College London, W1T 4JG, United-Kingdom
| | - Emanuele Rodolà
- Department of Computer Science, Sapienza University of Rome, 00185, Italy
| | | | - Andreea Deac
- Département d’informatique et de recherche opérationelle, Université de Montréal, QC H2S 3H1, Canada
| |
Collapse
|
42
|
Karnaukhov VK, Shcherbinin DS, Chugunov AO, Chudakov DM, Efremov RG, Zvyagin IV, Shugay M. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. NATURE COMPUTATIONAL SCIENCE 2024; 4:510-521. [PMID: 38987378 DOI: 10.1038/s43588-024-00653-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 06/04/2024] [Indexed: 07/12/2024]
Abstract
T cell receptor (TCR) recognition of foreign peptides presented by major histocompatibility complex protein is a major event in triggering the adaptive immune response to pathogens or cancer. The prediction of TCR-peptide interactions has great importance for therapy of cancer as well as infectious and autoimmune diseases but remains a major challenge, particularly for novel (unseen) peptide epitopes. Here we present TCRen, a structure-based method for ranking candidate unseen epitopes for a given TCR. The first stage of the TCRen pipeline is modeling of the TCR-peptide-major histocompatibility complex structure. Then a TCR-peptide residue contact map is extracted from this structure and used to rank all candidate epitopes on the basis of an interaction score with the target TCR. Scoring is performed using an energy potential derived from the statistics of TCR-peptide contact preferences in existing crystal structures. We show that TCRen has high performance in discriminating cognate versus unrelated peptides and can facilitate the identification of cancer neoepitopes recognized by tumor-infiltrating lymphocytes.
Collapse
MESH Headings
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/metabolism
- Humans
- Peptides/immunology
- Peptides/chemistry
- Epitopes/immunology
- Epitopes/chemistry
- Models, Molecular
- Neoplasms/immunology
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/chemistry
- Major Histocompatibility Complex/immunology
- Protein Conformation
- Lymphocytes, Tumor-Infiltrating/immunology
- Lymphocytes, Tumor-Infiltrating/metabolism
Collapse
Affiliation(s)
- Vadim K Karnaukhov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.
| | - Dmitrii S Shcherbinin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anton O Chugunov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Dmitriy M Chudakov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia.
- Central European Institute of Technology, Brno, Czech Republic.
| | - Roman G Efremov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Higher School of Economics, Moscow, Russia
| | - Ivan V Zvyagin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Mikhail Shugay
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
43
|
Sela M, Church JR, Schapiro I, Schneidman-Duhovny D. RhoMax: Computational Prediction of Rhodopsin Absorption Maxima Using Geometric Deep Learning. J Chem Inf Model 2024; 64:4630-4639. [PMID: 38829021 PMCID: PMC11200256 DOI: 10.1021/acs.jcim.4c00467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/05/2024]
Abstract
Microbial rhodopsins (MRs) are a diverse and abundant family of photoactive membrane proteins that serve as model systems for biophysical techniques. Optogenetics utilizes genetic engineering to insert specialized proteins into specific neurons or brain regions, allowing for manipulation of their activity through light and enabling the mapping and control of specific brain areas in living organisms. The obstacle of optogenetics lies in the fact that light has a limited ability to penetrate biological tissues, particularly blue light in the visible spectrum. Despite this challenge, most optogenetic systems rely on blue light due to the scarcity of red-shifted opsins. Finding additional red-shifted rhodopsins would represent a major breakthrough in overcoming the challenge of limited light penetration in optogenetics. However, determining the wavelength absorption maxima for rhodopsins based on their protein sequence is a significant hurdle. Current experimental methods are time-consuming, while computational methods lack accuracy. The paper introduces a new computational approach called RhoMax that utilizes structure-based geometric deep learning to predict the absorption wavelength of rhodopsins solely based on their sequences. The method takes advantage of AlphaFold2 for accurate modeling of rhodopsin structures. Once trained on a balanced train set, RhoMax rapidly and precisely predicted the maximum absorption wavelength of more than half of the sequences in our test set with an accuracy of 0.03 eV. By leveraging computational methods for absorption maxima determination, we can drastically reduce the time needed for designing new red-shifted microbial rhodopsins, thereby facilitating advances in the field of optogenetics.
Collapse
Affiliation(s)
- Meitar Sela
- The
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Jonathan R. Church
- Fritz
Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Igor Schapiro
- Fritz
Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Dina Schneidman-Duhovny
- The
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
44
|
Wang M, Bo Z, Zhang C, Guo M, Wu Y, Zhang X. Deciphering the Genetic Variation: A Comparative Analysis of Parental and Attenuated Strains of the QXL87 Vaccine for Infectious Bronchitis. Animals (Basel) 2024; 14:1784. [PMID: 38929403 PMCID: PMC11200882 DOI: 10.3390/ani14121784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/05/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
The QXL87 live attenuated vaccine strain for infectious bronchitis represents the first approved QX type (GI-19 lineage) vaccine in China. This strain was derived from the parental strain CK/CH/JS/2010/12 through continuous passage in SPF chicken embryos. To elucidate the molecular mechanism behind its attenuation, whole-genome sequencing was conducted on both the parental and attenuated strains. Analysis revealed 145 nucleotide mutations in the attenuated strain, leading to 48 amino acid mutations in various proteins, including Nsp2 (26), Nsp3 (14), Nsp4 (1), S (4), 3a (1), E (1), and N (1). Additionally, a frameshift mutation caused by a single base insertion in the ORFX resulted in a six-amino-acid extension. Subsequent comparison of post-translational modification sites, protein structure, and protein-protein binding sites between the parental and attenuated strains identified three potential virulence genes: Nsp2, Nsp3, and S. The amino acid mutations in these proteins not only altered their conformation but also affected the distribution of post-translational modification sites and protein-protein interaction sites. Furthermore, three potential functional mutation sites-P106S, A352T, and L472F, all located in the Nsp2 protein-were identified through PROVEAN, PolyPhen, and I-Mutant. Overall, our findings suggest that Nsp2, Nsp3, and S proteins may play a role in modulating IBV pathogenicity, with a particular focus on the significance of the Nsp2 protein. This study contributes to our understanding of the molecular mechanisms underlying IBV attenuation and holds promise for the development of safer live attenuated IBV vaccines using reverse genetic approaches.
Collapse
Affiliation(s)
- Mengmeng Wang
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Disease and Zoonoses, College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; (M.W.); (Z.B.); (C.Z.); (M.G.)
| | - Zongyi Bo
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Disease and Zoonoses, College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; (M.W.); (Z.B.); (C.Z.); (M.G.)
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, The Ministry of Education of China, Yangzhou University, Yangzhou 225009, China
| | - Chengcheng Zhang
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Disease and Zoonoses, College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; (M.W.); (Z.B.); (C.Z.); (M.G.)
| | - Mengjiao Guo
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Disease and Zoonoses, College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; (M.W.); (Z.B.); (C.Z.); (M.G.)
| | - Yantao Wu
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Disease and Zoonoses, College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; (M.W.); (Z.B.); (C.Z.); (M.G.)
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, The Ministry of Education of China, Yangzhou University, Yangzhou 225009, China
| | - Xiaorong Zhang
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Disease and Zoonoses, College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; (M.W.); (Z.B.); (C.Z.); (M.G.)
| |
Collapse
|
45
|
Jeevan K, Palistha S, Tayara H, Chong KT. PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction. J Cheminform 2024; 16:66. [PMID: 38849917 PMCID: PMC11157904 DOI: 10.1186/s13321-024-00865-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/09/2024] Open
Abstract
Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.
Collapse
Affiliation(s)
- Kandel Jeevan
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Shrestha Palistha
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil T Chong
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea.
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
46
|
Shankar SS, Banarjee R, Jathar SM, Rajesh S, Ramasamy S, Kulkarni MJ. De novo structure prediction of meteorin and meteorin-like protein for identification of domains, functional receptor binding regions, and their high-risk missense variants. J Biomol Struct Dyn 2024; 42:4522-4536. [PMID: 37288801 DOI: 10.1080/07391102.2023.2220804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 05/29/2023] [Indexed: 06/09/2023]
Abstract
Meteorin (Metrn) and Meteorin-like (Metrnl) are homologous secreted proteins involved in neural development and metabolic regulation. In this study, we have performed de novo structure prediction and analysis of both Metrn and Metrnl using Alphafold2 (AF2) and RoseTTAfold (RF). Based on the domain and structural homology analysis of the predicted structures, we have identified that these proteins are composed of two functional domains, a CUB domain and an NTR domain, connected by a hinge/loop region. We have identified the receptor binding regions of Metrn and Metrnl using the machine-learning tools ScanNet and Masif. These were further validated by docking Metrnl with its reported KIT receptor, thus establishing the role of each domain in the receptor interaction. Also, we have studied the effect of non-synonymous SNPs on the structure and function of these proteins using an array of bioinformatics tools and selected 16 missense variants in Metrn and 10 in Metrnl that can affect the protein stability. This is the first study to comprehensively characterize the functional domains of Metrn and Metrnl at their structural level and identify the functional domains, and protein binding regions. This study also highlights the interaction mechanism of the KIT receptor and Metrnl. The predicted deleterious SNPs will allow further understanding of the role of these variants in modulating the plasma levels of these proteins in disease conditions such as diabetes.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- S Shiva Shankar
- Proteomics Facility, Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Reema Banarjee
- Proteomics Facility, Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune, India
| | - Swaraj M Jathar
- Proteomics Facility, Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - S Rajesh
- Proteomics Facility, Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune, India
| | - Sureshkumar Ramasamy
- Proteomics Facility, Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune, India
| | - Mahesh J Kulkarni
- Proteomics Facility, Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
47
|
Xia Y, Pan X, Shen HB. A comprehensive survey on protein-ligand binding site prediction. Curr Opin Struct Biol 2024; 86:102793. [PMID: 38447285 DOI: 10.1016/j.sbi.2024.102793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/18/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
48
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. Structure-based prediction of protein-nucleic acid binding using graph neural networks. Biophys Rev 2024; 16:297-314. [PMID: 39345796 PMCID: PMC11427629 DOI: 10.1007/s12551-024-01201-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 05/28/2024] [Indexed: 10/01/2024] Open
Abstract
Protein-nucleic acid (PNA) binding plays critical roles in the transcription, translation, regulation, and three-dimensional organization of the genome. Structural models of proteins bound to nucleic acids (NA) provide insights into the chemical, electrostatic, and geometric properties of the protein structure that give rise to NA binding but are scarce relative to models of unbound proteins. We developed a deep learning approach for predicting PNA binding given the unbound structure of a protein that we call PNAbind. Our method utilizes graph neural networks to encode the spatial distribution of physicochemical and geometric properties of protein structures that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein, and using local encodings, they predict the location of individual NA binding residues. Our models can discriminate between specificity for DNA or RNA binding, and we show that predictions made on computationally derived protein structures can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and showed that our model predictions are consistent with and help explain experimental RNA binding data. Supplementary information The online version contains supplementary material available at 10.1007/s12551-024-01201-w.
Collapse
Affiliation(s)
- Jared M. Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Present Address: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158 USA
| | - Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Jiawei Huang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Xiaojiang S. Chen
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089 USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089 USA
| |
Collapse
|
49
|
Werren EA, Peirent ER, Jantti H, Guxholli A, Srivastava KR, Orenstein N, Narayanan V, Wiszniewski W, Dawidziuk M, Gawlinski P, Umair M, Khan A, Khan SN, Geneviève D, Lehalle D, van Gassen KLI, Giltay JC, Oegema R, van Jaarsveld RH, Rafiullah R, Rappold GA, Rabin R, Pappas JG, Wheeler MM, Bamshad MJ, Tsan YC, Johnson MB, Keegan CE, Srivastava A, Bielas SL. Biallelic variants in CSMD1 are implicated in a neurodevelopmental disorder with intellectual disability and variable cortical malformations. Cell Death Dis 2024; 15:379. [PMID: 38816421 PMCID: PMC11140003 DOI: 10.1038/s41419-024-06768-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 05/03/2024] [Accepted: 05/22/2024] [Indexed: 06/01/2024]
Abstract
CSMD1 (Cub and Sushi Multiple Domains 1) is a well-recognized regulator of the complement cascade, an important component of the innate immune response. CSMD1 is highly expressed in the central nervous system (CNS) where emergent functions of the complement pathway modulate neural development and synaptic activity. While a genetic risk factor for neuropsychiatric disorders, the role of CSMD1 in neurodevelopmental disorders is unclear. Through international variant sharing, we identified inherited biallelic CSMD1 variants in eight individuals from six families of diverse ancestry who present with global developmental delay, intellectual disability, microcephaly, and polymicrogyria. We modeled CSMD1 loss-of-function (LOF) pathogenesis in early-stage forebrain organoids differentiated from CSMD1 knockout human embryonic stem cells (hESCs). We show that CSMD1 is necessary for neuroepithelial cytoarchitecture and synchronous differentiation. In summary, we identified a critical role for CSMD1 in brain development and biallelic CSMD1 variants as the molecular basis of a previously undefined neurodevelopmental disorder.
Collapse
Affiliation(s)
- Elizabeth A Werren
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Advanced Precision Medicine Laboratory, The Jackson Laboratory for Genomic Medicine, Farmington, CTt, 06032, USA
| | - Emily R Peirent
- Neuroscience Graduate Program, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Henna Jantti
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alba Guxholli
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Kinshuk Raj Srivastava
- Medicinal and Process Chemistry Division, CSIR-Central Drug Research Institute, Lucknow, 226031, India
| | - Naama Orenstein
- Schneider Children's Medical Center of Israel, Petah Tikva, 4920235, Israel
| | - Vinodh Narayanan
- Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, AZ, 85004, USA
| | - Wojciech Wiszniewski
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, 97239, USA
| | - Mateusz Dawidziuk
- Department of Medical Genetics, Institute of Mother and Child, Warsaw, 01-211, Poland
| | - Pawel Gawlinski
- Department of Medical Genetics, Institute of Mother and Child, Warsaw, 01-211, Poland
| | - Muhammad Umair
- Medical Genomics Research Department, King Abdullah International Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, 11481, Saudi Arabia
- Department of Life Sciences, School of Science, University of Management and Technology, Lahore, Punjab, 54770, Pakistan
| | - Amjad Khan
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, 97239, USA
- Department of Zoology, University of Lakki Marwat, Lakki Marwat, Khyber Pakhtunkhwa, 28420, Pakistan
| | - Shahid Niaz Khan
- Department of Zoology, Kohat University of Science and Technology, Kohat, Pakistan
| | - David Geneviève
- Montpellier University, Inserm Unit U1183, Reference Center for Rare Diseases and Developmental Anomalies, CHU, 34000, Montpellier, France
| | - Daphné Lehalle
- Sorbonne University, Department of Medical Genetics, Hospital Armand Trousseau, 75012, Paris, France
| | - K L I van Gassen
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Jacques C Giltay
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Renske Oegema
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Richard H van Jaarsveld
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Rafiullah Rafiullah
- Department of Biotechnology, Faculty of Life Sciences, BUITEMS, Quetta, 87300, Pakistan
| | - Gudrun A Rappold
- Department of Human Molecular Genetics, Institute of Human Genetics, Ruprecht-Karls-University, Heidelberg, 69120, Germany
| | - Rachel Rabin
- Department of Pediatrics, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - John G Pappas
- Department of Pediatrics, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - Marsha M Wheeler
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Michael J Bamshad
- Department of Pediatrics, University of Washington, Seattle, WA, 98195, USA
- Brotman Baty Institute, Washington, 98195, USA
| | - Yao-Chang Tsan
- Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Matthew B Johnson
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Catherine E Keegan
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Anshika Srivastava
- Department of Medical Genetics, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, 226014, India.
| | - Stephanie L Bielas
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
50
|
Rao J, Xie J, Yuan Q, Liu D, Wang Z, Lu Y, Zheng S, Yang Y. A variational expectation-maximization framework for balanced multi-scale learning of protein and drug interactions. Nat Commun 2024; 15:4476. [PMID: 38796523 PMCID: PMC11530528 DOI: 10.1038/s41467-024-48801-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/14/2024] [Indexed: 05/28/2024] Open
Abstract
Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.
Collapse
Affiliation(s)
- Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Deqin Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Zhen Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, China.
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Sun Yat-sen University, Guangzhou, China.
- State Key Laboratory of Oncology in South China, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|