1
|
Hatibi N, Ait Benhassou H, Abik M. Predicted and Explained: Transforming drug discovery with AI for high-precision receptor-ligand interaction modeling and binding analysis. Comput Biol Med 2025; 192:110145. [PMID: 40381479 DOI: 10.1016/j.compbiomed.2025.110145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 04/01/2025] [Accepted: 04/03/2025] [Indexed: 05/20/2025]
Abstract
The pharmaceutical industry faces persistent challenges in developing effective treatments for complex diseases, creating an urgent need for innovative approaches to accelerate drug discovery. A pivotal factor in this process is the accurate prediction of receptor-ligand interactions, which are critical for ensuring drug efficacy and safety. Traditional prediction methods are often time-consuming and resource-intensive, necessitating more efficient computational approaches. In this study, we present a machine learning framework that accurately predicts docking scores by integrating three complementary molecular representations: Lipinski descriptors, fingerprints, and graph-based representations. To enhance predictive performance, we proposed two fusion strategies: early fusion (feature-level integration) and late fusion (decision-level aggregation). Notably, the early fusion model outperformed other approaches, demonstrating that combining diverse molecular representations enhances both predictive accuracy and robustness. To further improve interpretability, we applied Local Interpretable Model-agnostic Explanations method, which identified critical physicochemical and structural features driving predictions. This approach clarified binding dynamics by illustrating how specific features influence docking scores. Validation against established bioinformatics tools and 3D visualizations confirmed the framework's biological plausibility and reliability. By unifying multi-scale molecular representations with advanced machine learning techniques, our framework not only enhances prediction accuracy but also provides insights into ligand-receptor binding mechanisms. This advancement accelerates therapeutic development by enabling faster, data-driven prioritization of candidate molecules for complex diseases.
Collapse
Affiliation(s)
- Nissrine Hatibi
- Ecole Nationale Supérieure d'Informatique et d'Analyse des Systèmes (ENSIAS), Mohammed V University in Rabat, Rabat, Morocco; Prevention and Therapeutics Center, Moroccan Foundation of Advanced Science Innovation and Research (MAScIR), Mohammed VI Polytechnic University (UM6P), Benguerir, Morocco.
| | - Hassan Ait Benhassou
- Prevention and Therapeutics Center, Moroccan Foundation of Advanced Science Innovation and Research (MAScIR), Mohammed VI Polytechnic University (UM6P), Benguerir, Morocco
| | - Mounia Abik
- Ecole Nationale Supérieure d'Informatique et d'Analyse des Systèmes (ENSIAS), Mohammed V University in Rabat, Rabat, Morocco
| |
Collapse
|
2
|
Vargas-Rosales PA, Caflisch A. The physics-AI dialogue in drug design. RSC Med Chem 2025; 16:1499-1515. [PMID: 39906313 PMCID: PMC11788922 DOI: 10.1039/d4md00869c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 01/16/2025] [Indexed: 02/06/2025] Open
Abstract
A long path has led from the determination of the first protein structure in 1960 to the recent breakthroughs in protein science. Protein structure prediction and design methodologies based on machine learning (ML) have been recognized with the 2024 Nobel prize in Chemistry, but they would not have been possible without previous work and the input of many domain scientists. Challenges remain in the application of ML tools for the prediction of structural ensembles and their usage within the software pipelines for structure determination by crystallography or cryogenic electron microscopy. In the drug discovery workflow, ML techniques are being used in diverse areas such as scoring of docked poses, or the generation of molecular descriptors. As the ML techniques become more widespread, novel applications emerge which can profit from the large amounts of data available. Nevertheless, it is essential to balance the potential advantages against the environmental costs of ML deployment to decide if and when it is best to apply it. For hit to lead optimization ML tools can efficiently interpolate between compounds in large chemical series but free energy calculations by molecular dynamics simulations seem to be superior for designing novel derivatives. Importantly, the potential complementarity and/or synergism of physics-based methods (e.g., force field-based simulation models) and data-hungry ML techniques is growing strongly. Current ML methods have evolved from decades of research. It is now necessary for biologists, physicists, and computer scientists to fully understand advantages and limitations of ML techniques to ensure that the complementarity of physics-based methods and ML tools can be fully exploited for drug design.
Collapse
Affiliation(s)
| | - Amedeo Caflisch
- Department of Biochemistry, University of Zurich Winterthurerstrasse 190 8057 Zürich Switzerland
| |
Collapse
|
3
|
Yang D, Kuang L, Hu A. Edge-enhanced interaction graph network for protein-ligand binding affinity prediction. PLoS One 2025; 20:e0320465. [PMID: 40198678 PMCID: PMC11977954 DOI: 10.1371/journal.pone.0320465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 02/18/2025] [Indexed: 04/10/2025] Open
Abstract
Protein-ligand interactions are crucial in drug discovery. Accurately predicting protein-ligand binding affinity is essential for screening potential drugs. Graph neural networks have proven highly effective in modeling spatial relationships and three-dimensional structures within intermolecular. In this paper, we introduce a graph neural network-based model named EIGN to predict protein-ligand binding affinity. The model consists of three main components: the normalized adaptive encoder, the molecular information propagation module, and the output module. Experimental results indicate that EIGN achieves root mean squared error of 1.126 and Pearson correlation coefficient of 0.861 on CASF-2016. Additionally, our model outperforms state-of-the-art methods on CASF-2013, CASF-2016, and the CSAR-NRC set, showing exceptional accuracy and robust generalization ability. To further validate the effectiveness of EIGN, we conducted several experiments, including ablation studies, feature importance analysis, data similarity analysis, and others, to evaluate its performance and applicability.
Collapse
Affiliation(s)
| | | | - An Hu
- Xiangtan University, Xiangtan, Hunan, China
| |
Collapse
|
4
|
Barkan E, Siddiqui I, Cheng KJ, Golts A, Shoshan Y, Weber JK, Campos Mota Y, Ozery-Flato M, Sautto GA. Leveraging large language models to predict antibody biological activity against influenza A hemagglutinin. Comput Struct Biotechnol J 2025; 27:1286-1295. [PMID: 40230408 PMCID: PMC11995015 DOI: 10.1016/j.csbj.2025.03.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 03/19/2025] [Accepted: 03/21/2025] [Indexed: 04/16/2025] Open
Abstract
Monoclonal antibodies (mAbs) represent one of the most prevalent FDA-approved treatments for autoimmune, infectious, and cancer diseases. However, their discovery and development remains a time-consuming and costly process. Recent advancements in machine learning (ML) and artificial intelligence (AI) have shown significant promise in revolutionizing antibody discovery field. Models that predict antibody biological activity enable in silico evaluation of binding and functional properties; such models can prioritize antibodies with the highest likelihood of success in laboratory testing procedures. We explore an AI model for predicting the binding and receptor blocking activity of antibodies against influenza A hemagglutinin (HA) antigens. Our model is developed with the Molecular Aligned Multi-Modal Architecture and Language (MAMMAL) framework for biologics discovery to predict antibody-antigen interactions using only sequence information. To evaluate the model's performance, we tested it under various data split conditions to mimic real-world scenarios. Our model achieved an area under the receiver operating characteristic (AUROC) score of ≥ 0.91 for predicting the activity of existing antibodies against seen HAs and an AUROC score of 0.9 for unseen HAs. For novel antibody activity prediction, the AUROC was 0.73, which further declined to 0.63-0.66 under stringent constraints on similarity to existing antibodies. These results demonstrate the potential of AI foundation models to transform antibody design by reducing dependence on extensive laboratory testing and enabling more efficient prioritization of antibody candidates. Moreover, our findings emphasize the critical importance of diverse and comprehensive antibody datasets to improve the generalization of prediction models, particularly for novel antibody development.
Collapse
Affiliation(s)
| | | | - Kevin J. Cheng
- IBM TJ Watson Research Center, Yorktown Heights, NY, USA
| | | | | | | | - Yailin Campos Mota
- Florida Research and Innovation Center, Cleveland Clinic, Port St. Lucie, FL, USA
| | | | - Giuseppe A. Sautto
- Florida Research and Innovation Center, Cleveland Clinic, Port St. Lucie, FL, USA
| |
Collapse
|
5
|
Su Q, Wang J, Gou Q, Hu R, Jiang L, Zhang H, Wang T, Liu Y, Shen C, Kang Y, Hsieh CY, Hou T. Robust protein-ligand interaction modeling through integrating physical laws and geometric knowledge for absolute binding free energy calculation. Chem Sci 2025; 16:5043-5057. [PMID: 40007661 PMCID: PMC11848741 DOI: 10.1039/d4sc07405j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 02/15/2025] [Indexed: 02/27/2025] Open
Abstract
Accurate estimation of protein-ligand (PL) binding free energies is a crucial task in medicinal chemistry and a critical measure of PL interaction modeling effectiveness. However, traditional computational methods are often computationally expensive and prone to errors. Recently, deep learning (DL)-based approaches for predicting PL interactions have gained enormous attention, but their accuracy and generalizability are hindered by data scarcity. In this study, we propose LumiNet, a versatile PL interaction modeling framework that bridges the gap between physics-based models and black-box algorithms. LumiNet utilizes a subgraph transformer to extract multiscale information from molecular graphs and employs geometric neural networks to integrate PL information, mapping atomic pair structures into key physical parameters of non-bonded interactions in classical force fields, thereby enhancing accurate absolute binding free energy (ABFE) calculations. LumiNet is designed to be highly interpretable, offering detailed insights into atomic interactions within protein-ligand complexes, pinpointing relatively important atom pairs or groups. Our semi-supervised learning strategy enables LumiNet to adapt to new targets with fewer data points than other data-driven methods, making it more relevant for real-world drug discovery. Benchmarks show that LumiNet outperforms the current state-of-the-art model by 18.5% on the PDE10A dataset, and rivals the FEP+ method in some tests with a speed improvement of several orders of magnitude. We applied LumiNet in the scaffold hopping process, which accurately guided the discovery of the optimal ligands. Furthermore, we provide a web service for the research community to test LumiNet. The visualization of predicted inter-molecular energy contributions is expected to provide practical value in drug discovery projects.
Collapse
Affiliation(s)
- Qun Su
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Qiaolin Gou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Renling Hu
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Linlong Jiang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Hui Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tianyue Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yifei Liu
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
6
|
Ocana A, Pandiella A, Privat C, Bravo I, Luengo-Oroz M, Amir E, Gyorffy B. Integrating artificial intelligence in drug discovery and early drug development: a transformative approach. Biomark Res 2025; 13:45. [PMID: 40087789 PMCID: PMC11909971 DOI: 10.1186/s40364-025-00758-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Accepted: 03/05/2025] [Indexed: 03/17/2025] Open
Abstract
Artificial intelligence (AI) can transform drug discovery and early drug development by addressing inefficiencies in traditional methods, which often face high costs, long timelines, and low success rates. In this review we provide an overview of how to integrate AI to the current drug discovery and development process, as it can enhance activities like target identification, drug discovery, and early clinical development. Through multiomics data analysis and network-based approaches, AI can help to identify novel oncogenic vulnerabilities and key therapeutic targets. AI models, such as AlphaFold, predict protein structures with high accuracy, aiding druggability assessments and structure-based drug design. AI also facilitates virtual screening and de novo drug design, creating optimized molecular structures for specific biological properties. In early clinical development, AI supports patient recruitment by analyzing electronic health records and improves trial design through predictive modeling, protocol optimization, and adaptive strategies. Innovations like synthetic control arms and digital twins can reduce logistical and ethical challenges by simulating outcomes using real-world or virtual patient data. Despite these advancements, limitations remain. AI models may be biased if trained on unrepresentative datasets, and reliance on historical or synthetic data can lead to overfitting or lack generalizability. Ethical and regulatory issues, such as data privacy, also challenge the implementation of AI. In conclusion, in this review we provide a comprehensive overview about how to integrate AI into current processes. These efforts, although they will demand collaboration between professionals, and robust data quality, have a transformative potential to accelerate drug development.
Collapse
Affiliation(s)
- Alberto Ocana
- Experimental Therapeutics in Cancer Unit, Medical Oncology Department, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos and CIBERONC, Madrid, Spain.
- INTHEOS-CEU-START Catedra, Facultad de Medicina, Universidad CEU San Pablo, 28668 Boadilla del Monte, Madrid, Spain.
| | - Atanasio Pandiella
- Instituto de Biología Molecular y Celular del Cáncer, CSIC, IBSAL and CIBERONC, Salamanca, 37007, Spain
| | - Cristian Privat
- , CancerAppy, Av Ribera de Axpe, 28, Erando, 48950, Vizcaya, Spain
| | - Iván Bravo
- Facultad de Farmacia, Universidad de Castilla La Mancha, Albacete, Spain
| | | | - Eitan Amir
- Princess Margaret Cancer Center, Toronto, Canada
| | - Balazs Gyorffy
- Department of Bioinformatics, Semmelweis University, Tűzoltó U. 7-9, Budapest, 1094, Hungary
- Research Centre for Natural Sciences, Hungarian Research Network, Magyar Tudosok Korutja 2, Budapest, 1117, Hungary
- Department of Biophysics, Medical School, University of Pecs, Pecs, 7624, Hungary
| |
Collapse
|
7
|
Debnath K, Rana P, Ghosh P. GramSeq-DTA: A Grammar-Based Drug-Target Affinity Prediction Approach Fusing Gene Expression Information. Biomolecules 2025; 15:405. [PMID: 40149941 PMCID: PMC11940521 DOI: 10.3390/biom15030405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Revised: 03/07/2025] [Accepted: 03/10/2025] [Indexed: 03/29/2025] Open
Abstract
Drug-target affinity (DTA) prediction is a critical aspect of drug discovery. The meaningful representation of drugs and targets is crucial for accurate prediction. Using 1D string-based representations for drugs and targets is a common approach that has demonstrated good results in drug-target affinity prediction. However, these approach lacks information on the relative position of the atoms and bonds. To address this limitation, graph-based representations have been used to some extent. However, solely considering the structural aspect of drugs and targets may be insufficient for accurate DTA prediction. Integrating the functional aspect of these drugs at the genetic level can enhance the prediction capability of the models. To fill this gap, we propose GramSeq-DTA, which integrates chemical perturbation information with the structural information of drugs and targets. We applied a Grammar Variational Autoencoder (GVAE) for drug feature extraction and utilized two different approaches for protein feature extraction as follows: a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). The chemical perturbation data are obtained from the L1000 project, which provides information on the up-regulation and down-regulation of genes caused by selected drugs. This chemical perturbation information is processed, and a compact dataset is prepared, serving as the functional feature set of the drugs. By integrating the drug, gene, and target features in the model, our approach outperforms the current state-of-the-art DTA prediction models when validated on widely used DTA datasets (BindingDB, Davis, and KIBA). This work provides a novel and practical approach to DTA prediction by merging the structural and functional aspects of biological entities, and it encourages further research in multi-modal DTA prediction.
Collapse
Affiliation(s)
- Kusal Debnath
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| | - Pratip Rana
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA;
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| |
Collapse
|
8
|
Valsson Í, Warren MT, Deane CM, Magarkar A, Morris GM, Biggin PC. Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data. Commun Chem 2025; 8:41. [PMID: 39922899 PMCID: PMC11807228 DOI: 10.1038/s42004-025-01428-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Accepted: 01/23/2025] [Indexed: 02/10/2025] Open
Abstract
Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector-protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall's τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall's τ of 0.68 and 0.49 on the FEP benchmark) while being ~400,000 times faster.
Collapse
Affiliation(s)
- Ísak Valsson
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Matthew T Warren
- Structural Bioinformatics and Computational Biochemistry, Department of Biochemistry, University of Oxford, Oxford, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Aniket Magarkar
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an de Riß, Germany.
| | - Garrett M Morris
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.
| | - Philip C Biggin
- Structural Bioinformatics and Computational Biochemistry, Department of Biochemistry, University of Oxford, Oxford, UK.
| |
Collapse
|
9
|
Durant G, Boyles F, Birchall K, Marsden B, Deane CM. Robustly interrogating machine learning-based scoring functions: what are they learning? Bioinformatics 2025; 41:btaf040. [PMID: 39874452 PMCID: PMC11821266 DOI: 10.1093/bioinformatics/btaf040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 07/08/2024] [Accepted: 01/24/2025] [Indexed: 01/30/2025] Open
Abstract
MOTIVATION Machine learning-based scoring functions (MLBSFs) have been found to exhibit inconsistent performance on different benchmarks and be prone to learning dataset bias. For the field to develop MLBSFs that learn a generalizable understanding of physics, a more rigorous understanding of how they perform is required. RESULTS In this work, we compared the performance of a diverse set of popular MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, and PointVS) to our proposed baseline models that can only learn dataset biases on a range of benchmarks. We found that these baseline models were competitive in accuracy to these MLBSFs in almost all proposed benchmarks, indicating these models only learn dataset biases. Our tests and provided platform, ToolBoxSF, will enable researchers to robustly interrogate MLBSF performance and determine the effect of dataset biases on their predictions. AVAILABILITY AND IMPLEMENTATION https://github.com/guydurant/toolboxsf.
Collapse
Affiliation(s)
- Guy Durant
- Department of Statistics, University of Oxford, St Giles', Oxford OX1 3LB, United Kingdom
| | - Fergus Boyles
- Department of Statistics, University of Oxford, St Giles', Oxford OX1 3LB, United Kingdom
| | | | - Brian Marsden
- Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, St Giles', Oxford OX1 3LB, United Kingdom
| |
Collapse
|
10
|
Crusius D, Cipcigan F, Biggin PC. Are we fitting data or noise? Analysing the predictive power of commonly used datasets in drug-, materials-, and molecular-discovery. Faraday Discuss 2025; 256:304-321. [PMID: 39308206 DOI: 10.1039/d4fd00091a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Data-driven techniques for establishing quantitative structure property relations are a pillar of modern materials and molecular discovery. Fuelled by the recent progress in deep learning methodology and the abundance of new algorithms, it is tempting to chase benchmarks and incrementally build ever more capable machine learning (ML) models. While model evaluation has made significant progress, the intrinsic limitations arising from the underlying experimental data are often overlooked. In the chemical sciences data collection is costly, thus datasets are small and experimental errors can be significant. These limitations of such datasets affect their predictive power, a fact that is rarely considered in a quantitative way. In this study, we analyse commonly used ML datasets for regression and classification from drug discovery, molecular discovery, and materials discovery. We derived maximum and realistic performance bounds for nine such datasets by introducing noise based on estimated or actual experimental errors. We then compared the estimated performance bounds to the reported performance of leading ML models in the literature. Out of the nine datasets and corresponding ML models considered, four were identified to have reached or surpassed dataset performance limitations and thus, they may potentially be fitting noise. More generally, we systematically examine how data range, the magnitude of experimental error, and the number of data points influence dataset performance bounds. Alongside this paper, we release the Python package NoiseEstimator and provide a web-based application for computing realistic performance bounds. This study and the resulting tools will help practitioners in the field understand the limitations of datasets and set realistic expectations for ML model performance. This work stands as a reference point, offering analysis and tools to guide development of future ML models in the chemical sciences.
Collapse
Affiliation(s)
- Daniel Crusius
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK.
| | - Flaviu Cipcigan
- IBM Research Europe, The Hartree Centre STFC Laboratory, Sci-Tech Daresbury, Warrington WA4 4AD, UK
| | - Philip C Biggin
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK.
| |
Collapse
|
11
|
Zhang Y, Vitalis A. Benchmarking the robustness of the correct identification of flexible 3D objects using common machine learning models. PATTERNS (NEW YORK, N.Y.) 2025; 6:101147. [PMID: 39896260 PMCID: PMC11783895 DOI: 10.1016/j.patter.2024.101147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 09/09/2024] [Accepted: 12/10/2024] [Indexed: 02/04/2025]
Abstract
True three-dimensional (3D) data are prevalent in domains such as molecular science or computer vision. In these data, machine learning models are often asked to identify objects subject to intrinsic flexibility. Our study introduces two datasets from molecular science to assess the classification robustness of common model/feature combinations. Molecules are flexible, and shapes alone offer intra-class heterogeneities that yield a high risk for confusions. By blocking training and test sets to reduce overlap, we establish a baseline requiring the trained models to abstract from shape. As training data coverage grows, all tested architectures perform better on unseen data with reduced overfitting. Empirically, 2D embeddings of voxelized data produced the best-performing models. Evidently, both featurization and task-appropriate model design are of continued importance, the latter point reinforced by comparisons to recent, more specialized models. Finally, we show that the shape abstraction learned from database samples extends to samples that are evolving explicitly in time.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andreas Vitalis
- Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
12
|
Meng J, Zhang L, He Z, Hu M, Liu J, Bao W, Tian Q, Feng H, Liu H. Development of a machine learning-based target-specific scoring function for structure-based binding affinity prediction for human dihydroorotate dehydrogenase inhibitors. J Comput Chem 2025; 46:e27510. [PMID: 39325045 DOI: 10.1002/jcc.27510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024]
Abstract
Human dihydroorotate dehydrogenase (hDHODH) is a flavin mononucleotide-dependent enzyme that can limit de novo pyrimidine synthesis, making it a therapeutic target for diseases such as autoimmune disorders and cancer. In this study, using the docking structures of complexes generated by AutoDock Vina, we integrate interaction features and ligand features, and employ support vector regression to develop a target-specific scoring function for hDHODH (TSSF-hDHODH). The Pearson correlation coefficient values of TSSF-hDHODH in the cross-validation and external validation are 0.86 and 0.74, respectively, both of which are far superior to those of classic scoring function AutoDock Vina and random forest (RF) based generic scoring function RF-Score. TSSF-hDHODH is further used for the virtual screening of potential inhibitors in the FDA-Approved & Pharmacopeia Drug Library. In conjunction with the results from molecular dynamics simulations, crizotinib is identified as a candidate for subsequent structural optimization. This study can be useful for the discovery of hDHODH inhibitors and the development of scoring functions for additional targets.
Collapse
Affiliation(s)
- Jinhui Meng
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
- Liaoning Provincial Key Laboratory of Computational Simulation and Information Processing of Biomacromolecules, Liaoning University, Shenyang, Liaoning, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, Liaoning, China
| | - Zhe He
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Mengfeng Hu
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Jinhan Liu
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Wenzhuo Bao
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Qifeng Tian
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Huawei Feng
- School of Pharmacy, Liaoning University, Shenyang, Liaoning, China
| | - Hongsheng Liu
- Liaoning Provincial Key Laboratory of Computational Simulation and Information Processing of Biomacromolecules, Liaoning University, Shenyang, Liaoning, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, Liaoning, China
- School of Pharmacy, Liaoning University, Shenyang, Liaoning, China
| |
Collapse
|
13
|
Fasoulis R, Paliouras G, Kavraki LE. RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models. J Chem Inf Model 2024; 64:8729-8742. [PMID: 39555889 PMCID: PMC11633655 DOI: 10.1021/acs.jcim.4c01278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 10/16/2024] [Accepted: 11/07/2024] [Indexed: 11/19/2024]
Abstract
The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.
Collapse
Affiliation(s)
- Romanos Fasoulis
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
| | - Georgios Paliouras
- Institute
of Informatics and Telecommunications, NCSR
Demokritos, Athens 15341, Greece
| | - Lydia E. Kavraki
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
- Ken
Kennedy Institute, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
14
|
Elalouf A, Rosenfeld AY, Maoz H. Targeting serotonin receptors with phytochemicals - an in-silico study. Sci Rep 2024; 14:30307. [PMID: 39638796 PMCID: PMC11621125 DOI: 10.1038/s41598-024-76329-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/14/2024] [Indexed: 12/07/2024] Open
Abstract
The potential of natural phytochemicals in mitigating depression has been supported by substantial evidence. This study evaluated a total of 88 natural phytochemicals with potential antidepressant properties by targeting serotonin (5-HT) receptors (5-HT1A, 5-HT4, and 5-HT7) using molecular docking, ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) analysis, internal coordinates normal mode analysis (NMA), molecular dynamics simulation (MDS), and free energy calculation. Five evaluated compounds (Genistein, Kaempferol, Daidzein, Peonidin, and glycitein) exhibited favorable pharmacokinetic properties and improved binding scores, indicating their potential as effective antidepressants. Redocking and superimposition analysis of 5-HT with cocrystal structures validated these findings. Furthermore, NMA, MDS, and free energy calculations confirmed the stability and deformability of the ligand-receptor complexes, suggesting that these phytochemicals can effectively interact with 5-HT receptors to modulate depressive symptoms. These powerful phytochemicals, abundantly found in soybeans, fruits, vegetables, and herbs, represent a promising avenue for developing natural treatments for depression. Further in vitro and in vivo studies are warranted to explore their efficacy in alleviating stress and depression through their interactions with 5-HT receptors.
Collapse
Affiliation(s)
- Amir Elalouf
- Department of Management, Bar-Ilan University, Ramat Gan, 5290002, Israel.
| | | | - Hanan Maoz
- Department of Management, Bar-Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
15
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
16
|
Xu Y, Wang Q, Xu G, Xu Y, Mou Y. Screening, optimization, and ADMET evaluation of HCJ007 for pancreatic cancer treatment through active learning and dynamics simulation. Front Chem 2024; 12:1482758. [PMID: 39654652 PMCID: PMC11626003 DOI: 10.3389/fchem.2024.1482758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Accepted: 11/12/2024] [Indexed: 12/12/2024] Open
Abstract
In this study, we leveraged a sophisticated active learning model to enhance virtual screening for SQLE inhibitors. The model's improved predictive accuracy identified compounds with significant advantages in binding affinity and thermodynamic stability. Detailed analyses, including molecular dynamics simulations and ADMET profiling, were conducted, particularly focusing on compounds CMNPD11566 and its derivative HCJ007. CMNPD11566 showed stable interactions with SQLE, while HCJ007 exhibited improved binding stability and more frequent interactions with key residues, indicating enhanced dynamic adaptability and overall binding effectiveness. ADMET data comparison highlighted HCJ007s superior profile in terms of lower toxicity and better drug-likeness. Our findings suggest HCJ007 as a promising candidate for SQLE inhibition, with significant improvements over CMNPD11566 in various pharmacokinetic and safety parameters. The study underscores the efficacy of computational models in drug discovery and the importance of comprehensive preclinical evaluations.
Collapse
Affiliation(s)
- YunYun Xu
- General Surgery, Cancer Center, Department of Gastrointestinal and Pancreatic Surgery, Zhejiang Provincial People’s Hospital (Affiliated People’s Hospital), Hangzhou Medical College, Hangzhou, Zhejiang, China
| | - Qiang Wang
- General Surgery, Tiantai People’s Hospital, Taizhou, Zhejiang, China
| | - GaoQiang Xu
- General Surgery, Tiantai People’s Hospital, Taizhou, Zhejiang, China
| | - YouJian Xu
- General Surgery, Tiantai People’s Hospital, Taizhou, Zhejiang, China
| | - YiPing Mou
- General Surgery, Cancer Center, Department of Gastrointestinal and Pancreatic Surgery, Zhejiang Provincial People’s Hospital (Affiliated People’s Hospital), Hangzhou Medical College, Hangzhou, Zhejiang, China
| |
Collapse
|
17
|
Kamuntavičius G, Prat A, Paquet T, Bastas O, Aty HA, Sun Q, Andersen CB, Harman J, Siladi ME, Rines DR, Flatters SJL, Tal R, Norvaišas P. Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1. J Cheminform 2024; 16:127. [PMID: 39543721 PMCID: PMC11566907 DOI: 10.1186/s13321-024-00914-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 10/10/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. RESULTS We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening (HydraScreen) tools with an automated robotic cloud lab designed explicitly for ultra-high-throughput screening, enabling us to validate these platforms experimentally. By employing our target evaluation tool to select IRAK1 as the focal point of our investigation, we prospectively validate our structure-based deep learning model. We can identify 23.8% of all IRAK1 hits within the top 1% of ranked compounds. The model outperforms traditional virtual screening techniques and offers advanced features such as ligand pose confidence scoring. Simultaneously, we identify three potent (nanomolar) scaffolds from our compound library, 2 of which represent novel candidates for IRAK1 and hold promise for future development. CONCLUSION This study provides compelling evidence for SpectraView and HydraScreen to provide a significant acceleration in the processes of target identification and hit discovery. By leveraging Ro5's HydraScreen and Strateos' automated labs in hit identification for IRAK1, we show how AI-driven virtual screening with HydraScreen could offer high hit discovery rates and reduce experimental costs. SCIENTIFIC CONTRIBUTION We present an innovative platform that leverages Knowledge graph-based biomedical data analytics and AI-driven virtual screening integrated with robotic cloud labs. Through an unbiased, prospective evaluation we show the reliability and robustness of HydraScreen in virtual and high-throughput screening for hit identification in IRAK1. Our platforms and innovative tools can expedite the early stages of drug discovery.
Collapse
Affiliation(s)
| | - Alvaro Prat
- AI Chemistry, Ro5, 2801 Gateway Drive, Irving, 75063, TX, USA.
| | - Tanya Paquet
- AI Chemistry, Ro5, 2801 Gateway Drive, Irving, 75063, TX, USA
| | - Orestis Bastas
- AI Chemistry, Ro5, 2801 Gateway Drive, Irving, 75063, TX, USA
| | | | - Qing Sun
- Strateos, 3565 Haven Ave Suite 3, Menlo Park, 94025, CA, USA
| | | | - John Harman
- Strateos, 3565 Haven Ave Suite 3, Menlo Park, 94025, CA, USA
| | - Marc E Siladi
- Strateos, 3565 Haven Ave Suite 3, Menlo Park, 94025, CA, USA
| | - Daniel R Rines
- Strateos, 3565 Haven Ave Suite 3, Menlo Park, 94025, CA, USA
| | | | - Roy Tal
- AI Chemistry, Ro5, 2801 Gateway Drive, Irving, 75063, TX, USA.
| | | |
Collapse
|
18
|
Vural O, Jololian L, Pan L. DeepLigType: Predicting Ligand Types of ProteinLigand Binding Sites Using a Deep Learning Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; PP:116-123. [PMID: 39509302 DOI: 10.1109/tcbb.2024.3493820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model.
Collapse
|
19
|
Hong Y, Ha J, Sim J, Lim CJ, Oh KS, Chandrasekaran R, Kim B, Choi J, Ko J, Shin WH, Lee J. Accurate prediction of protein-ligand interactions by combining physical energy functions and graph-neural networks. J Cheminform 2024; 16:121. [PMID: 39497201 PMCID: PMC11536843 DOI: 10.1186/s13321-024-00912-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 10/07/2024] [Indexed: 11/07/2024] Open
Abstract
We introduce an advanced model for predicting protein-ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein-ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein-ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein-ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model's efficiency and generalizability. The model's efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery.Scientific contributionOur work introduces a novel training strategy for a protein-ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea
| | - Junsu Ha
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea
| | - Jaemin Sim
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Chae Jo Lim
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea
| | - Kwang-Seok Oh
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea
| | | | - Bomin Kim
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jieun Choi
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Junsu Ko
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
| | - Woong-Hee Shin
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
- Department of Medicine, Korea University College of Medicine, Seoul, 02841, Republic of Korea.
| | - Juyong Lee
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea.
- Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea.
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
20
|
Hu Q, Wang Z, Meng J, Li W, Guo J, Mu Y, Wang S, Zheng L, Wei Y. OpenDock: a pytorch-based open-source framework for protein-ligand docking and modelling. Bioinformatics 2024; 40:btae628. [PMID: 39432683 PMCID: PMC11552628 DOI: 10.1093/bioinformatics/btae628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/19/2024] [Accepted: 10/19/2024] [Indexed: 10/23/2024] Open
Abstract
MOTIVATION Molecular docking is an invaluable computational tool with broad applications in computer-aided drug design and enzyme engineering. However, current molecular docking tools are typically implemented in languages such as C++ for calculation speed, which lack flexibility and user-friendliness for further development. Moreover, validating the effectiveness of external scoring functions for molecular docking and screening within these frameworks is challenging, and implementing more efficient sampling strategies is not straightforward. RESULTS To address these limitations, we have developed an open-source molecular docking framework, OpenDock, based on Python and PyTorch. This framework supports the integration of multiple scoring functions; some can be utilized during molecular docking and pose optimization, while others can be used for post-processing scoring. In terms of sampling, the current version of this framework supports simulated annealing and Monte Carlo optimization. Additionally, it can be extended to include methods such as genetic algorithms and particle swarm optimization for sampling docking poses and protein side chain orientations. Distance constraints are also implemented to enable covalent docking, restricted docking or distance map constraints guided pose sampling. Overall, this framework serves as a valuable tool in drug design and enzyme engineering, offering significant flexibility for most protein-ligand modelling tasks. AVAILABILITY AND IMPLEMENTATION OpenDock is publicly available at: https://github.com/guyuehuo/opendock.
Collapse
Affiliation(s)
- Qiuyue Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zechen Wang
- School of Physics, Shangdong University, Jinan, 250100, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| | - Weifeng Li
- School of Physics, Shangdong University, Jinan, 250100, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, 999078, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Sheng Wang
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201203, China
| | | | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| |
Collapse
|
21
|
Hemant Kumar S, Venkatachalapathy M, Sistla R, Poongavanam V. Advances in molecular glues: exploring chemical space and design principles for targeted protein degradation. Drug Discov Today 2024; 29:104205. [PMID: 39393773 DOI: 10.1016/j.drudis.2024.104205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 09/18/2024] [Accepted: 10/04/2024] [Indexed: 10/13/2024]
Abstract
The discovery of the E3 ligase cereblon (CRBN) as the target of thalidomide and its analogs revolutionized the field of targeted protein degradation (TPD). This ubiquitin-mediated degradation pathway was first harnessed by bivalent degraders. Recently, the emergence of low-molecular-weight molecular glue degraders (MGDs) has expanded the TPD landscape, because MGDs operate via the same mechanism while offering attractive physicochemical properties that are consistent with small-molecule therapeutics. This review delves into the discovery and advancement of MGDs, with case studies on cyclin K and the zinc finger protein IKZF2, highlighting the design principles, biological assays and therapeutic applications. Additionally, it examines the chemical space of molecular glues and outlines the collaborative efforts that are fueling innovation in this field.
Collapse
Affiliation(s)
- S Hemant Kumar
- thinkMolecular Technologies Pvt. Ltd, Haralur, Bangalore, KA 560102, India
| | | | - Ramesh Sistla
- thinkMolecular Technologies Pvt. Ltd, Haralur, Bangalore, KA 560102, India.
| | | |
Collapse
|
22
|
Li B, Tan K, Lao AR, Wang H, Zheng H, Zhang L. A comprehensive review of artificial intelligence for pharmacology research. Front Genet 2024; 15:1450529. [PMID: 39290983 PMCID: PMC11405247 DOI: 10.3389/fgene.2024.1450529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 08/26/2024] [Indexed: 09/19/2024] Open
Abstract
With the innovation and advancement of artificial intelligence, more and more artificial intelligence techniques are employed in drug research, biomedical frontier research, and clinical medicine practice, especially, in the field of pharmacology research. Thus, this review focuses on the applications of artificial intelligence in drug discovery, compound pharmacokinetic prediction, and clinical pharmacology. We briefly introduced the basic knowledge and development of artificial intelligence, presented a comprehensive review, and then summarized the latest studies and discussed the strengths and limitations of artificial intelligence models. Additionally, we highlighted several important studies and pointed out possible research directions.
Collapse
Affiliation(s)
- Bing Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Kan Tan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Angelyn R Lao
- Department of Mathematics and Statistics, De La Salle University, Manila, Philippines
| | - Haiying Wang
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Huiru Zheng
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
23
|
Prat A, Abdel Aty H, Bastas O, Kamuntavičius G, Paquet T, Norvaišas P, Gasparotto P, Tal R. HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery. J Chem Inf Model 2024; 64:5817-5831. [PMID: 39037942 DOI: 10.1021/acs.jcim.4c00481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
We propose HydraScreen, a deep-learning framework for safe and robust accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network designed for the effective representation of molecular structures and interactions in protein-ligand binding. We designed an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assessed our approach using established public benchmarks based on the CASF-2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). We introduced a novel approach for interaction profiling, aimed at detecting potential biases within both the model and data sets. This approach not only enhanced interpretability but also reinforced the impartiality of our methodology. Finally, we demonstrated HydraScreen's ability to generalize effectively across novel proteins and ligands through a temporal split. We also provide insights into potential avenues for future development aimed at enhancing the robustness of machine learning scoring functions. HydraScreen (accessible at http://hydrascreen.ro5.ai/paper) provides a user-friendly GUI and a public API, facilitating the easy-access assessment of protein-ligand complexes.
Collapse
Affiliation(s)
- Alvaro Prat
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Hisham Abdel Aty
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Orestis Bastas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | | | - Tanya Paquet
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Povilas Norvaišas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Piero Gasparotto
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Roy Tal
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| |
Collapse
|
24
|
Wang Z, Zhou F, Wang Z, Hu Q, Li YQ, Wang S, Wei Y, Zheng L, Li W, Peng X. Fully Flexible Molecular Alignment Enables Accurate Ligand Structure Modeling. J Chem Inf Model 2024; 64:6205-6215. [PMID: 39074901 DOI: 10.1021/acs.jcim.4c00669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
Accurate protein-ligand binding poses are the prerequisites of structure-based binding affinity prediction and provide the structural basis for in-depth lead optimization in small molecule drug design. However, it is challenging to provide reasonable predictions of binding poses for different molecules due to the complexity and diversity of the chemical space of small molecules. Similarity-based molecular alignment techniques can effectively narrow the search range, as structurally similar molecules are likely to have similar binding modes, with higher similarity usually correlated to higher success rates. However, molecular similarity is not consistently high because molecules often require changes to achieve specific purposes, leading to reduced alignment precision. To address this issue, we propose a new alignment method─Z-align. This method uses topological structural information as a criterion for evaluating similarity, reducing the reliance on molecular fingerprint similarity. Our method has achieved success rates significantly higher than those of other methods at moderate levels of similarity. Additionally, our approach can comprehensively and flexibly optimize bond lengths and angles of molecules, maintaining a high accuracy even when dealing with larger molecules. Consequently, our proposed solution helps in achieving more accurate binding poses in protein-ligand docking problems, facilitating the development of small molecule drugs. Z-align is freely available as a web server at https://cloud.zelixir.com/zalign/home.
Collapse
Affiliation(s)
- Zhihao Wang
- School of Physics, Shandong University, Jinan, 250100, China
| | - Fan Zhou
- Shanghai Zelixir Biotech, Shanghai, 200030, China
| | - Zechen Wang
- School of Physics, Shandong University, Jinan, 250100, China
| | - Qiuyue Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yong-Qiang Li
- School of Physics, Shandong University, Jinan, 250100, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Shanghai, 200030, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Shanghai, 200030, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, 250100, China
| | - Xiangda Peng
- Shanghai Zelixir Biotech, Shanghai, 200030, China
| |
Collapse
|
25
|
Spassov DS. Binding Affinity Determination in Drug Design: Insights from Lock and Key, Induced Fit, Conformational Selection, and Inhibitor Trapping Models. Int J Mol Sci 2024; 25:7124. [PMID: 39000229 PMCID: PMC11240957 DOI: 10.3390/ijms25137124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 06/25/2024] [Accepted: 06/26/2024] [Indexed: 07/16/2024] Open
Abstract
Binding affinity is a fundamental parameter in drug design, describing the strength of the interaction between a molecule and its target protein. Accurately predicting binding affinity is crucial for the rapid development of novel therapeutics, the prioritization of promising candidates, and the optimization of their properties through rational design strategies. Binding affinity is determined by the mechanism of recognition between proteins and ligands. Various models, including the lock and key, induced fit, and conformational selection, have been proposed to explain this recognition process. However, current computational strategies to predict binding affinity, which are based on these models, have yet to produce satisfactory results. This article explores the connection between binding affinity and these protein-ligand interaction models, highlighting that they offer an incomplete picture of the mechanism governing binding affinity. Specifically, current models primarily center on the binding of the ligand and do not address its dissociation. In this context, the concept of ligand trapping is introduced, which models the mechanisms of dissociation. When combined with the current models, this concept can provide a unified theoretical framework that may allow for the accurate determination of the ligands' binding affinity.
Collapse
Affiliation(s)
- Danislav S Spassov
- Drug Design and Bioinformatics Lab, Department of Chemistry, Faculty of Pharmacy, Medical University of Sofia, 1000 Sofia, Bulgaria
| |
Collapse
|
26
|
Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024; 29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on the effective use of these powerful technologies. Furthermore, progress in computational techniques and architectures continues to be highly dynamic, involving novel deep neural network models and artificial intelligence (AI) accelerators, and potentially quantum processing units in the future. These are expected to be disruptive for the life sciences as a whole and for drug discovery in particular. Here, we identify three waves of acceleration and their applications in a bioinformatics context: (i) GPU computing, (ii) AI and (iii) next-generation quantum computers.
Collapse
Affiliation(s)
- Bertil Schmidt
- Institut für Informatik, Johannes Gutenberg University, Mainz, Germany.
| | | |
Collapse
|
27
|
Nasser Binjawhar D, Abu Ali OA, Alqahtani AS, Fayad E, Abo-Bakr AM, Mekhael AM, Sadek FM. Powerful Approach for New Drugs as Antibacterial Agents via Molecular Docking and In Vitro Studies of Some New Cyclic Imides and Quinazoline-2,5-diones. ACS OMEGA 2024; 9:18566-18575. [PMID: 38680340 PMCID: PMC11044208 DOI: 10.1021/acsomega.4c01176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/19/2024] [Accepted: 03/27/2024] [Indexed: 05/01/2024]
Abstract
We generated novel elven 1,2,3,6-tetrahydrophthalimides and tetrahydroquinazoline derivatives from 1,2,3,6-tetrahydrophthalic anhydride (1) in response to our interest in using the anhydrides to produce heterocyclic nitrogen compounds. The elemental and spectral analyses of the produced compounds validated the recommended configurations and MOE 2014.09 (Molecular Operating Environment) computations were used to perform their in silico analysis. The synthesized compounds have been analyzed and put through various experiments, including in vitro and in silico methods to assess their biological activity against Escherichia coli Penicillin-Binding Protein 3 (PBP3) and Staphylococcus aureus Penicillin-Binding Protein 2 (PBP2), among these compounds showing promising data as antibacterial drugs.
Collapse
Affiliation(s)
- Dalal Nasser Binjawhar
- Department
of Chemistry, College of Science, Princess
Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Ola A. Abu Ali
- Department
of Chemistry, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Arwa Sultan Alqahtani
- Department
of Chemistry, College of Science, Imam Mohammad
Ibn Saud Islamic University (IMSIU), P.O. Box 90950, Riyadh 11623, Saudi Arabia
| | - Eman Fayad
- Department
of Biotechnology, College of Sciences, Taif
University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Ahmed M. Abo-Bakr
- Chemistry
Department, Faculty of Science, South Valley
University, P.O. Box 83523, Qena 83523, Egypt
| | - Antonous. M. Mekhael
- Cotton Leaf
Worm Department, Plant Protection Research Institute, Agriculture Research Center, P.O. Box 12619, Giza 12611, Egypt
| | - Fayza M. Sadek
- Radiation
Sciences Department, Medical Research Institution, Alexandria University, P.O. Box 21500, Alexandria 5424041, Egypt
| |
Collapse
|
28
|
Mqawass G, Popov P. graphLambda: Fusion Graph Neural Networks for Binding Affinity Prediction. J Chem Inf Model 2024; 64:2323-2330. [PMID: 38366974 DOI: 10.1021/acs.jcim.3c00771] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Predicting the binding affinity of protein-ligand complexes is crucial for computer-aided drug discovery (CADD) and the identification of potential drug candidates. The deep learning-based scoring functions have emerged as promising predictors of binding constants. Building on recent advancements in graph neural networks, we present graphLambda for protein-ligand binding affinity prediction, which utilizes graph convolutional, attention, and isomorphism blocks to enhance the predictive capabilities. The graphLambda model exhibits superior performance across CASF16 and CSAR HiQ NRC benchmarks and demonstrates robustness with respect to different types of train-validation set partitions. The development of graphLambda underscores the potential of graph neural networks in advancing binding affinity prediction models, contributing to more effective CADD methodologies.
Collapse
Affiliation(s)
- Ghaith Mqawass
- Faculty of Computer Science, University of Vienna, Vienna A-1090, Austria
- UniVie Doctoral School Computer Science, University of Vienna, Vienna A-1090, Austria
| | - Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, Bremen 28759, Germany
| |
Collapse
|
29
|
Nandi S, Bhaduri S, Das D, Ghosh P, Mandal M, Mitra P. Deciphering the Lexicon of Protein Targets: A Review on Multifaceted Drug Discovery in the Era of Artificial Intelligence. Mol Pharm 2024; 21:1563-1590. [PMID: 38466810 DOI: 10.1021/acs.molpharmaceut.3c01161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Understanding protein sequence and structure is essential for understanding protein-protein interactions (PPIs), which are essential for many biological processes and diseases. Targeting protein binding hot spots, which regulate signaling and growth, with rational drug design is promising. Rational drug design uses structural data and computational tools to study protein binding sites and protein interfaces to design inhibitors that can change these interactions, thereby potentially leading to therapeutic approaches. Artificial intelligence (AI), such as machine learning (ML) and deep learning (DL), has advanced drug discovery and design by providing computational resources and methods. Quantum chemistry is essential for drug reactivity, toxicology, drug screening, and quantitative structure-activity relationship (QSAR) properties. This review discusses the methodologies and challenges of identifying and characterizing hot spots and binding sites. It also explores the strategies and applications of artificial-intelligence-based rational drug design technologies that target proteins and protein-protein interaction (PPI) binding hot spots. It provides valuable insights for drug design with therapeutic implications. We have also demonstrated the pathological conditions of heat shock protein 27 (HSP27) and matrix metallopoproteinases (MMP2 and MMP9) and designed inhibitors of these proteins using the drug discovery paradigm in a case study on the discovery of drug molecules for cancer treatment. Additionally, the implications of benzothiazole derivatives for anticancer drug design and discovery are deliberated.
Collapse
Affiliation(s)
- Suvendu Nandi
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Soumyadeep Bhaduri
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Debraj Das
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Priya Ghosh
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Mahitosh Mandal
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| |
Collapse
|
30
|
Wang Z, Wang S, Li Y, Guo J, Wei Y, Mu Y, Zheng L, Li W. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief Bioinform 2024; 25:bbae145. [PMID: 38581420 PMCID: PMC10998640 DOI: 10.1093/bib/bbae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024] Open
Abstract
Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Weifeng Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| |
Collapse
|
31
|
Kravchenko A, de Vries SJ, Smaïl-Tabbone M, Chauvot de Beauchene I. HIPPO: HIstogram-based Pseudo-POtential for scoring protein-ssRNA fragment-based docking poses. BMC Bioinformatics 2024; 25:129. [PMID: 38532339 DOI: 10.1186/s12859-024-05733-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 03/06/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND The RNA-Recognition motif (RRM) is a protein domain that binds single-stranded RNA (ssRNA) and is present in as much as 2% of the human genome. Despite this important role in biology, RRM-ssRNA interactions are very challenging to study on the structural level because of the remarkable flexibility of ssRNA. In the absence of atomic-level experimental data, the only method able to predict the 3D structure of protein-ssRNA complexes with any degree of accuracy is ssRNA'TTRACT, an ssRNA fragment-based docking approach using ATTRACT. However, since ATTRACT parameters are not ssRNA-specific and were determined in 2010, there is substantial opportunity for enhancement. RESULTS Here we present HIPPO, a composite RRM-ssRNA scoring potential derived analytically from contact frequencies in near-native versus non-native docking models. HIPPO consists of a consensus of four distinct potentials, each extracted from a distinct reference pool of protein-trinucleotide docking decoys. To score a docking pose with one potential, for each pair of RNA-protein coarse-grained bead types, each contact is awarded or penalised according to the relative frequencies of this contact distance range among the correct and incorrect poses of the reference pool. Validated on a fragment-based docking benchmark of 57 experimentally solved RRM-ssRNA complexes, HIPPO achieved a threefold or higher enrichment for half of the fragments, versus only a quarter with the ATTRACT scoring function. In particular, HIPPO drastically improved the chance of very high enrichment (12-fold or higher), a scenario where the incremental modelling of entire ssRNA chains from fragments becomes viable. However, for the latter result, more research is needed to make it directly practically applicable. Regardless, our approach already improves upon the state of the art in RRM-ssRNA modelling and is in principle extendable to other types of protein-nucleic acid interactions.
Collapse
Affiliation(s)
- Anna Kravchenko
- Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
| | | | | | | |
Collapse
|
32
|
Metcalf D, Glick ZL, Bortolato A, Jiang A, Cheney DL, Sherrill CD. Directional Δ G Neural Network (DrΔ G-Net): A Modular Neural Network Approach to Binding Free Energy Prediction. J Chem Inf Model 2024; 64:1907-1918. [PMID: 38470995 PMCID: PMC10966643 DOI: 10.1021/acs.jcim.3c02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024]
Abstract
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔG-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
Collapse
Affiliation(s)
- Derek
P. Metcalf
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Zachary L. Glick
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Andrea Bortolato
- Molecular
Structure and Design, Bristol-Myers Squibb
Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Andy Jiang
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Daniel L. Cheney
- Molecular
Structure and Design, Bristol-Myers Squibb
Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - C. David Sherrill
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| |
Collapse
|
33
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
34
|
Akhtyamov P, Nabi A, Gafurov V, Sizykh A, Favorov A, Medvedeva Y, Stupnikov A. GPU-accelerated Kendall distance computation for large or sparse data. Gigascience 2024; 13:giae088. [PMID: 39658191 PMCID: PMC11631066 DOI: 10.1093/gigascience/giae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 09/05/2024] [Indexed: 12/12/2024] Open
Abstract
BACKGROUND Current experimental practices typically produce large multidimensional datasets. Distance matrix calculation between elements (e.g., samples) for such data, although being often necessary in preprocessing for statistical inference or visualization, can be computationally demanding. Data sparsity, which is often observed in various experimental data modalities, such as single-cell sequencing in bioinformatics or collaborative filtering in recommendation systems, may pose additional algorithmic challenges. RESULTS We present GPU-Assisted Distance Estimation Software (GADES), a graphical processing unit (GPU)-enhanced package that allows for massively paralleled Kendall-$\tau$ distance matrices computation. The package's architecture involves specific memory management, which lifts the limits for the data size imposed by GPU memory capacity. Additional algorithmic solutions provide a means to address the data sparsity problem and reinforce the acceleration effect for sparse datasets. Benchmarking against available central processing unit-based packages on simulated and real experimental single-cell RNA sequencing or single-cell ATAC sequencing datasets demonstrated significantly higher speed for GADES compared to other methods for both sparse and dense data processing, with additional performance boost for the sparse data. CONCLUSIONS This work significantly contributes to the development of computational strategies for high-performance Kendall distance matrices computation and allows for the efficient processing of Big Data with the power of GPU. GADES is freely available at https://github.com/lab-medvedeva/GADES-main.
Collapse
Affiliation(s)
- Pavel Akhtyamov
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Russia
- Moscow Center for Advanced Studies, Department of Biomedical Physics, Moscow, 123592, Russia
| | - Ausaaf Nabi
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Russia
- Moscow Center for Advanced Studies, Department of Biomedical Physics, Moscow, 123592, Russia
| | - Vladislav Gafurov
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Russia
- Moscow Center for Advanced Studies, Department of Biomedical Physics, Moscow, 123592, Russia
| | - Alexey Sizykh
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Russia
- Moscow Center for Advanced Studies, Department of Biomedical Physics, Moscow, 123592, Russia
- University of Manitoba, Department of Biochemistry and Medical Genetics, Winnipeg, MB R3E 3P5, Canada
| | - Alexander Favorov
- Johns Hopkins University School of Medicine, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Baltimore, MD 21205, USA
- Vavilov Institute of General Genetics, Laboratory of Systems Biology and Computational Genetics, Moscow, 119333, Russia
| | - Yulia Medvedeva
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Russia
- Moscow Center for Advanced Studies, Department of Biomedical Physics, Moscow, 123592, Russia
- Research Center of Biotechnology, Institute of Bioengineering, 117312, Moscow, Russia
| | - Alexey Stupnikov
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Russia
- Moscow Center for Advanced Studies, Department of Biomedical Physics, Moscow, 123592, Russia
| |
Collapse
|
35
|
Chakrabarti M, Tan YS, Balius TE. Considerations Around Structure-Based Drug Discovery for KRAS Using DOCK. Methods Mol Biol 2024; 2797:67-90. [PMID: 38570453 DOI: 10.1007/978-1-0716-3822-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
Molecular docking is a popular computational tool in drug discovery. Leveraging structural information, docking software predicts binding poses of small molecules to cavities on the surfaces of proteins. Virtual screening for ligand discovery is a useful application of docking software. In this chapter, using the enigmatic KRAS protein as an example system, we endeavor to teach the reader about best practices for performing molecular docking with UCSF DOCK. We discuss methods for virtual screening and docking molecules on KRAS. We present the following six points to optimize our docking setup for prosecuting a virtual screen: protein structure choice, pocket selection, optimization of the scoring function, modification of sampling spheres and sampling procedures, choosing an appropriate portion of chemical space to dock, and the choice of which top scoring molecules to pick for purchase.
Collapse
Affiliation(s)
- Mayukh Chakrabarti
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Y Stanley Tan
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Trent E Balius
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
| |
Collapse
|
36
|
Abdelkader GA, Kim JD. Advances in Protein-Ligand Binding Affinity Prediction via Deep Learning: A Comprehensive Study of Datasets, Data Preprocessing Techniques, and Model Architectures. Curr Drug Targets 2024; 25:1041-1065. [PMID: 39318214 PMCID: PMC11774311 DOI: 10.2174/0113894501330963240905083020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 08/11/2024] [Accepted: 08/19/2024] [Indexed: 09/26/2024]
Abstract
BACKGROUND Drug discovery is a complex and expensive procedure involving several timely and costly phases through which new potential pharmaceutical compounds must pass to get approved. One of these critical steps is the identification and optimization of lead compounds, which has been made more accessible by the introduction of computational methods, including deep learning (DL) techniques. Diverse DL model architectures have been put forward to learn the vast landscape of interaction between proteins and ligands and predict their affinity, helping in the identification of lead compounds. OBJECTIVE This survey fills a gap in previous research by comprehensively analyzing the most commonly used datasets and discussing their quality and limitations. It also offers a comprehensive classification of the most recent DL methods in the context of protein-ligand binding affinity prediction (BAP), providing a fresh perspective on this evolving field. METHODS We thoroughly examine commonly used datasets for BAP and their inherent characteristics. Our exploration extends to various preprocessing steps and DL techniques, including graph neural networks, convolutional neural networks, and transformers, which are found in the literature. We conducted extensive literature research to ensure that the most recent deep learning approaches for BAP were included by the time of writing this manuscript. RESULTS The systematic approach used for the present study highlighted inherent challenges to BAP via DL, such as data quality, model interpretability, and explainability, and proposed considerations for future research directions. We present valuable insights to accelerate the development of more effective and reliable DL models for BAP within the research community. CONCLUSION The present study can considerably enhance future research on predicting affinity between protein and ligand molecules, hence further improving the overall drug development process.
Collapse
Affiliation(s)
- Gelany Aly Abdelkader
- Department of Computer Science and Electronic Engineering, Sun Moon University, Asan 31460, Republic of Korea
| | - Jeong-Dong Kim
- Department of Computer Science and Electronic Engineering, Sun Moon University, Asan 31460, Republic of Korea
- Division of Computer Science and Engineering, Sun Moon University, Asan 31460, Republic of Korea
- Genome-based BioIT Convergence Institute, Sun Moon University, Asan 31460, Korea
| |
Collapse
|
37
|
Baillif B, Cole J, Giangreco I, McCabe P, Bender A. Applying atomistic neural networks to bias conformer ensembles towards bioactive-like conformations. J Cheminform 2023; 15:124. [PMID: 38129933 PMCID: PMC10740246 DOI: 10.1186/s13321-023-00794-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/10/2023] [Indexed: 12/23/2023] Open
Abstract
Identifying bioactive conformations of small molecules is an essential process for virtual screening applications relying on three-dimensional structure such as molecular docking. For most small molecules, conformer generators retrieve at least one bioactive-like conformation, with an atomic root-mean-square deviation (ARMSD) lower than 1 Å, among the set of low-energy conformers generated. However, there is currently no general method to prioritise these likely target-bound conformations in the ensemble. In this work, we trained atomistic neural networks (AtNNs) on 3D information of generated conformers of a curated subset of PDBbind ligands to predict the ARMSD to their closest bioactive conformation, and evaluated the early enrichment of bioactive-like conformations when ranking conformers by AtNN prediction. AtNN ranking was compared with bioactivity-unaware baselines such as ascending Sage force field energy ranking, and a slower bioactivity-based baseline ranking by ascending Torsion Fingerprint Deviation to the Maximum Common Substructure to the most similar molecule in the training set (TFD2SimRefMCS). On test sets from random ligand splits of PDBbind, ranking conformers using ComENet, the AtNN encoding the most 3D information, leads to early enrichment of bioactive-like conformations with a median BEDROC of 0.29 ± 0.02, outperforming the best bioactivity-unaware Sage energy ranking baseline (median BEDROC of 0.18 ± 0.02), and performing on a par with the bioactivity-based TFD2SimRefMCS baseline (median BEDROC of 0.31 ± 0.02). The improved performance of the AtNN and TFD2SimRefMCS baseline is mostly observed on test set ligands that bind proteins similar to proteins observed in the training set. On a more challenging subset of flexible molecules, the bioactivity-unaware baselines showed median BEDROCs up to 0.02, while AtNNs and TFD2SimRefMCS showed median BEDROCs between 0.09 and 0.13. When performing rigid ligand re-docking of PDBbind ligands with GOLD using the 1% top-ranked conformers, ComENet ranked conformers showed a higher successful docking rate than bioactivity-unaware baselines, with a rate of 0.48 ± 0.02 compared to CSD probability baseline with a rate of 0.39 ± 0.02. Similarly, on a pharmacophore searching experiment, selecting the 20% top-ranked conformers ranked by ComENet showed higher hit rate compared to baselines. Hence, the approach presented here uses AtNNs successfully to focus conformer ensembles towards bioactive-like conformations, representing an opportunity to reduce computational expense in virtual screening applications on known targets that require input conformations.
Collapse
Affiliation(s)
- Benoit Baillif
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK
| | - Jason Cole
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK
| | - Ilenia Giangreco
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK
- Exscientia plc, The Schrödinger Building, Oxford Science Park, Oxford, OX4 4GE, UK
| | - Patrick McCabe
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK
| | - Andreas Bender
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK.
| |
Collapse
|
38
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
39
|
Yan J, Ye Z, Yang Z, Lu C, Zhang S, Liu Q, Qiu J. Multi-task bioassay pre-training for protein-ligand binding affinity prediction. Brief Bioinform 2023; 25:bbad451. [PMID: 38084920 PMCID: PMC10783875 DOI: 10.1093/bib/bbad451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/27/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.
Collapse
Affiliation(s)
- Jiaxian Yan
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Jiezhong Qiu
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| |
Collapse
|
40
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
41
|
Francoeur PG, Koes DR. Expanding Training Data for Structure-Based Receptor-Ligand Binding Affinity Regression through Imputation of Missing Labels. ACS OMEGA 2023; 8:41680-41688. [PMID: 37970017 PMCID: PMC10634251 DOI: 10.1021/acsomega.3c05931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/10/2023] [Accepted: 10/17/2023] [Indexed: 11/17/2023]
Abstract
The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor-ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor-ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper.
Collapse
Affiliation(s)
- Paul G. Francoeur
- Department of Computational and Systems
Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David R. Koes
- Department of Computational and Systems
Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
42
|
Barsbey M, ÖZçelİk R, Bağ A, Atil B, ÖZgür A, Ozkirimli E. A Computational Software for Training Robust Drug-Target Affinity Prediction Models: pydebiaseddta. J Comput Biol 2023; 30:1240-1245. [PMID: 37988394 DOI: 10.1089/cmb.2023.0194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023] Open
Abstract
Robust generalization of drug-target affinity (DTA) prediction models is a notoriously difficult problem in computational drug discovery. In this article, we present pydebiaseddta: a computational software for improving the generalizability of DTA prediction models to novel ligands and/or proteins. pydebiaseddta serves as the practical implementation of the DebiasedDTA training framework, which advocates modifying the training distribution to mitigate the effect of spurious correlations in the training data set that leads to substantially degraded performance for novel ligands and proteins. Written in Python programming language, pydebiaseddta combines a user-friendly streamlined interface with a feature-rich and highly modifiable architecture. With this article we introduce our software, showcase its main functionalities, and describe practical ways for new users to engage with it.
Collapse
Affiliation(s)
- Melİh Barsbey
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Riza ÖZçelİk
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Alperen Bağ
- Technical University of Munich, Munich, Germany
| | - Berk Atil
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Arzucan ÖZgür
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Elif Ozkirimli
- Roche Informatics, F. Hoffmann-La Roche AG, Basel, Switzerland
| |
Collapse
|
43
|
Kemp BA, Howell NL, Gildea JJ, Hinkle JD, Shabanowitz J, Hunt DF, Conaway MR, Keller SR, Carey RM. Evidence That Binding of Cyclic GMP to the Extracellular Domain of NKA (Sodium-Potassium ATPase) Mediates Natriuresis. Circ Res 2023; 132:1127-1140. [PMID: 36919600 PMCID: PMC10171454 DOI: 10.1161/circresaha.122.321693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 03/07/2023] [Indexed: 03/16/2023]
Abstract
BACKGROUND Extracellular renal interstitial guanosine cyclic 3',5'-monophosphate (cGMP) inhibits renal proximal tubule (RPT) sodium (Na+) reabsorption via Src (Src family kinase) activation. Through which target extracellular cGMP acts to induce natriuresis is unknown. We hypothesized that cGMP binds to the extracellular α1-subunit of NKA (sodium-potassium ATPase) on RPT basolateral membranes to inhibit Na+ transport similar to ouabain-a cardiotonic steroid. METHODS Urine Na+ excretion was measured in uninephrectomized 12-week-old female Sprague-Dawley rats that received renal interstitial infusions of vehicle (5% dextrose in water), cGMP (18, 36, and 72 μg/kg per minute; 30 minutes each), or cGMP+rostafuroxin (12 ng/kg per minute) or were subjected to pressure-natriuresis±rostafuroxin infusion. Rostafuroxin is a digitoxigenin derivative that displaces ouabain from NKA. RESULTS Renal interstitial cGMP and raised renal perfusion pressure induced natriuresis and increased phosphorylated SrcTyr416 and Erk 1/2 (extracellular signal-regulated protein kinase 1/2)Thr202/Tyr204; these responses were abolished with rostafuroxin coinfusion. To assess cGMP binding to NKA, we performed competitive binding studies with isolated rat RPTs using bodipy-ouabain (2 μM)+cGMP (10 µM) or rostafuroxin (10 µM) and 8-biotin-11-cGMP (2 μM)+ouabain (10 μM) or rostafuroxin (10 µM). cGMP or rostafuroxin reduced bodipy-ouabain fluorescence intensity, and ouabain or rostafuroxin reduced 8-biotin-11-cGMP staining. We cross-linked isolated rat RPTs with 4-N3-PET-8-biotin-11-cGMP (2 μM); 8-N3-6-biotin-10-cAMP served as negative control. Precipitation with streptavidin beads followed by immunoblot analysis showed that RPTs after cross-linking with 4-N3-PET-8-biotin-11-cGMP exhibited a significantly stronger signal for NKA than non-cross-linked samples and cross-linked or non-cross-linked 8-N3-6-biotin-10-cAMP RPTs. Ouabain (10 μM) reduced NKA in cross-linked 4-N3-PET-8-biotin-11-cGMP RPTs confirming fluorescence staining. 4-N3-PET-8-biotin-11-cGMP cross-linked samples were separated by SDS gel electrophoresis and slices corresponding to NKA molecular weight excised and processed for mass spectrometry. NKA was the second most abundant protein with 50 unique NKA peptides covering 47% of amino acids in NKA. Molecular modeling demonstrated a potential cGMP docking site in the ouabain-binding pocket of NKA. CONCLUSIONS cGMP can bind to NKA and thereby mediate natriuresis.
Collapse
Affiliation(s)
- Brandon A Kemp
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| | - Nancy L Howell
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| | - John J Gildea
- Department of Pathology (J.J.G.), University of Virginia, Charlottesville
| | - Josh D Hinkle
- Department of Chemistry (J.D.H., J.S., D.F.H.), University of Virginia, Charlottesville
| | - Jeffrey Shabanowitz
- Department of Chemistry (J.D.H., J.S., D.F.H.), University of Virginia, Charlottesville
| | - Donald F Hunt
- Department of Chemistry (J.D.H., J.S., D.F.H.), University of Virginia, Charlottesville
| | - Mark R Conaway
- Division of Translational Research and Applied Statistics, Department of Public Health Sciences (M.R.C.), University of Virginia, Charlottesville
| | - Susanna R Keller
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| | - Robert M Carey
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| |
Collapse
|