1
|
Hall BW, Tummino TA, Tang K, Mailhot O, Castanon M, Irwin JJ, Shoichet BK. A Database for Large-Scale Docking and Experimental Results. J Chem Inf Model 2025; 65:4458-4467. [PMID: 40273444 DOI: 10.1021/acs.jcim.5c00394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2025]
Abstract
The rapid expansion of readily accessible compounds over the past six years has transformed molecular docking, improving hit rates and affinities. While many millions of molecules may score well in a docking campaign, the results are rarely fully shared, hindering the benchmarking of machine learning and chemical space exploration methods that seek to explore the expanding chemical spaces. To address this gap, we develop a website providing access to recent large library campaigns, including poses, scores, and in vitro results for campaigns against 11 targets, with 6.3 billion molecules docked and 3729 compounds experimentally tested. In a simple proof-of-concept study that speaks to the new library's utility, we use the new database to train machine learning models to predict docking scores and to find the top 0.01% scoring molecules while evaluating only 1% of the library. Even in these proof-of-concept studies, some interesting trends emerge: unsurprisingly, as models train on larger sets, they perform better; less expectedly, models could achieve high correlations with docking scores and yet still fail to enrich the new docking-discovered ligands, or even the top 0.01% of docking-ranked molecules. It will be interesting to see how these trends develop for methods more sophisticated than the simple proof-of-concept studies undertaken here; the database is openly available at lsd.docking.org.
Collapse
Affiliation(s)
- Brendan W Hall
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Tia A Tummino
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Khanh Tang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Olivier Mailhot
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Mar Castanon
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
2
|
Junaid M, Zeeshan M, Khan A, Alshabrmi FM, Li W. SPLIF-Enhanced Attention-Driven 3D CNNs for Precise and Reliable Protein-Ligand Interaction Modeling for METTL3. ACS OMEGA 2025; 10:16748-16761. [PMID: 40321522 PMCID: PMC12044449 DOI: 10.1021/acsomega.5c00538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Revised: 03/26/2025] [Accepted: 03/27/2025] [Indexed: 05/08/2025]
Abstract
Structure-based virtual screening (SBVS) is a cornerstone of modern drug discovery pipelines. However, conventional scoring functions often fail to capture the complexities of protein-ligand binding interactions. To address this limitation, we developed DeepMETTL3, a novel scoring function that integrates 3D convolutional neural networks (CNNs) with multihead attention mechanisms and high-dimensional Structural Protein-Ligand Interaction Fingerprints (SPLIF). This approach enables the model to capture intricate 3D interaction patterns while refining and prioritizing features for precise classification of active and inactive compounds. We validated DeepMETTL3 using METTL3 as a therapeutic target, employing a scaffold-based data-splitting strategy and multiple test sets, including challenging sets with minimal chemical similarity to the training data. Our results demonstrate that DeepMETTL3 outperforms traditional scoring functions, achieving superior accuracy, robustness, and scalability. Key findings include the importance of an active-to-decoy ratio (1:50) in the training set for enhanced performance and the optimal placement of the attention mechanism after CNN1 for improved generalization. DeepMETTL3 represents a significant advancement in target-specific machine learning for SBVS, offering a framework that can be adapted to other biological targets. This work underscores the potential of deep learning in artificial intelligence-based drug design, balancing computational efficiency and predictive power in molecular docking and virtual screening. The scoring function is freely available at https://github.com/juniML/DeepMETTL3.
Collapse
Affiliation(s)
- Muhammad Junaid
- Institute
for Advanced Study, Shenzhen University, Shenzhen 518060, China
- College
of Physics and Optoelectronics Engineering, Shenzhen University, Shenzhen 518060, China
| | - Muhammad Zeeshan
- Department
of Bioinformatics and Biotechnology, Islamic
International University Islamabad, Islamabad 44000, Pakistan
| | - Abbas Khan
- Department
of Biomedical Sciences, Sir Jeffrey Cheah Sunway Medical School, Faculty
of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
| | - Fahad M. Alshabrmi
- Department
of Medical Laboratories, College
of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia
| | - Wenjin Li
- Institute
for Advanced Study, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
3
|
Md Ashiq SJ, Snekha AC, Muthu Kumar T, Dhivya Dharshini U, Ashiru Aliyu Z. Phytochemical Screening and Computational Insights into VP26 Inhibitors for Mitigating White Spot Syndrome Virus in Shrimp Aquaculture. Mol Biotechnol 2025:10.1007/s12033-025-01442-4. [PMID: 40301280 DOI: 10.1007/s12033-025-01442-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Accepted: 04/10/2025] [Indexed: 05/01/2025]
Abstract
White spot syndrome disease (WSSD) is a contagious disease caused by white spot syndrome virus (WSSV) in shrimp aquaculture. Given its high degree of contagiousness, controlling its rapid spread has proven to be challenging, causing significant economic loss to farmers. To prevent these losses, farmers often resort to the use of large doses of general aquaculture antibiotics. However, prolonged exposure to such antibiotics can lead to adverse effects for both the shrimp and the humans consuming them. Additionally, this practice contributes to the global issue of antimicrobial resistance. Currently, there are no vaccines or antibiotics that specifically target this virus. Therefore, exploring potential compounds through virtual screening offers a promising avenue for finding effective solutions. A total of 1683 phytochemical metabolites from 40 medicinal plants were screened against the target VP26, which plays a pivotal role in virus maturation. Initial screening was performed via ADMET and molecular docking analysis. Furthermore, we evaluated the binding affinity via machine learning-based scoring schemes. Importantly, the compounds displayed applicable toxicity properties during testing with ECOSAR. The binding ability of the compounds was validated with 150 ns of MD simulation. Overall, isocolumbin and urolithin A 3-O-glucuronide had significant effects on the outcomes of all the analyses. Therefore, we believe that this compound could be an alternative therapeutic option to the WSS virus in shrimp aquaculture.
Collapse
Affiliation(s)
- S J Md Ashiq
- Department of Biotechnology, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, 638401, India
| | - A C Snekha
- Department of Biotechnology, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, 638401, India
| | - T Muthu Kumar
- School of Sciences and Humanities, SR University, Warangal, Telangana, 506371, India.
| | - U Dhivya Dharshini
- Department of Biotechnology, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, 638401, India
| | | |
Collapse
|
4
|
Okubo S, Hirose S, Aoki S. Discovery of Novel Antimicrobial-Active Compounds and Their Analogues by In Silico Small Chemical Screening Targeting Staphylococcus aureus MurB. Molecules 2025; 30:1477. [PMID: 40286128 PMCID: PMC11990925 DOI: 10.3390/molecules30071477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 03/15/2025] [Accepted: 03/18/2025] [Indexed: 04/29/2025] Open
Abstract
Methicillin-resistant Staphylococcus aureus is a serious problem in healthcare due to its lethal severe infections and resistance to most antimicrobial agents. The number of new approved antimicrobial agents is declining, and combined with the spread of drug-resistant bacteria, it is predicted that effective antimicrobial agents against multidrug-resistant bacteria will be exhausted. We conducted in silico and in vitro discovery of novel antimicrobial small molecules targeting the SaMurB enzyme involved in cell wall synthesis in Staphylococcus aureus (S. aureus). We performed hierarchical structure-based drug screenings to identify compounds and their analogues using a library of approximately 1.3 million compound structures. In vitro experiments with Staphylococcus epidermidis (S. epidermidis) identified three compounds (SH5, SHa6, and SHa13) that exhibit antibacterial activity. These three compounds do not have toxicity against human-derived cells. SHa13 exhibited remarkable activity (IC50 value =1.64 ± 0.01 µM). The active compound was predicted to bind to the active site of SaMurB by forming a hydrogen bond with Arg188 in both R and S bodies. These data provide a starting point for the development of novel cell wall synthesis inhibitors as antimicrobial agents targeting SaMurB.
Collapse
Affiliation(s)
| | | | - Shunsuke Aoki
- Department of Bioscience and Bioinformatics, Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka 820-8502, Japan (S.H.)
| |
Collapse
|
5
|
Paranthaman P, Karuppasamy R, Veerappapillai S. Drug repurposing through Biophysical Insights: Focus on Indoleamine 2,3-Dioxygenase and Tryptophan 2,3-Dioxygenase Dual Inhibitors. Cell Biochem Biophys 2025:10.1007/s12013-025-01725-2. [PMID: 40133710 DOI: 10.1007/s12013-025-01725-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2025] [Indexed: 03/27/2025]
Abstract
The kynurenine pathway (KP) plays a pivotal role in dampening the immune response in many types of cancer, including TNBC. The intricate involvement of tryptophan degradation via KP serves as a critical regulator in mediating immunosuppression in the tumor microenvironment. The key enzymes that facilitate this mechanism and contribute to tumor progression are indoleamine 2,3-dioxygenase (IDO1) and tryptophan 2,3-dioxygenase (TDO). Despite attempts to use navoximod as a dual-specific inhibitor, its poor bioavailability and lack of clinical efficacy have hampered its utility. To date, no FDA-approved drugs have advanced for dual targeting of these enzymes. Therefore, this study aimed to repurpose the approved drugs from the DrugBank database as novel IDO1/TDO inhibitors. Initially, 2588 FDA-approved compounds were screened by employing molecular docking and pharmacokinetic profiling. Subsequently, methods such as MM-GBSA calculations and machine learning based analysis precisely identified 20 potential lead compounds. The resultant compounds were then assessed for various toxicity endpoints and anticancer activity. The PaccMann server revealed potent anticancer activity, with sensitivities ranging from 0.203 to 24.119 μM against MDA-MB-231 TNBC cell lines. Alongside, the interaction profile with critical residues, strongly reinforced DB06292 (Dapagliflozin) as a compelling hit candidate. Finally, the reliability of the result was corroborated through a rigorous 200 ns molecular dynamics simulation, ensuring the stable binding of the hit against the target proteins. Considering the promising outcomes, we speculate that the proposed hit compound holds strong potential for the management of TNBC.
Collapse
Affiliation(s)
- Priyanga Paranthaman
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Ramanathan Karuppasamy
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| |
Collapse
|
6
|
Liang H, Xie A, Hou N, Wei F, Gao T, Li J, Gao X, Shi C, Xiao G, Xu X. Increase Docking Score Screening Power by Simple Fusion With CNNscore. J Comput Chem 2025; 46:e70060. [PMID: 39981784 DOI: 10.1002/jcc.70060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 12/31/2024] [Accepted: 01/26/2025] [Indexed: 02/22/2025]
Abstract
Scoring functions (SFs) of molecular docking is a vital component of structure-based virtual screening (SBVS). Traditional SFs yield their inherent shortage for idealized approximations and simplifications predicting the binding affinity. Complementarily, SFs based on deep learning (DL) have emerged as powerful tools for capturing intricate features within protein-ligand (PL) interactions. We here present a docking-score fusion strategy that integrates pose scores derived from GNINA's convolutional neural network (CNN) with traditional docking scores. Extensive validation on diverse datasets has shown that by means of multiplying Watvina docking score by CNNscore demonstrates state-of-the-art screening power. Furthermore, in a reverse practice, our docking-score fusion technique was incorporated into the virtual screening (VS) workflow aimed at identifying inhibitors of the challenging target TYK2. Two promising hits with IC50 9.99 μM and 13.76 μM in vitro were identified from nearly 12 billion molecules.
Collapse
Affiliation(s)
- Huicong Liang
- Marine Biomedical Research Institute of Qingdao, School of Medicine and Pharmacy, Key Laboratory of Marine Drugs, Chinese Ministry of Education, Ocean University of China, Qingdao, P. R. China
| | - Aowei Xie
- College of Food Science and Engineering, Ocean University of China, Qingdao, P. R. China
| | - Ning Hou
- Marine Biomedical Research Institute of Qingdao, School of Medicine and Pharmacy, Key Laboratory of Marine Drugs, Chinese Ministry of Education, Ocean University of China, Qingdao, P. R. China
| | - Fengjiao Wei
- Marine Biomedical Research Institute of Qingdao, School of Medicine and Pharmacy, Key Laboratory of Marine Drugs, Chinese Ministry of Education, Ocean University of China, Qingdao, P. R. China
| | - Ting Gao
- College of Food Science and Engineering, Ocean University of China, Qingdao, P. R. China
| | - Jiajie Li
- Marine Biomedical Research Institute of Qingdao, School of Medicine and Pharmacy, Key Laboratory of Marine Drugs, Chinese Ministry of Education, Ocean University of China, Qingdao, P. R. China
| | - Xinru Gao
- Marine Biomedical Research Institute of Qingdao, School of Medicine and Pharmacy, Key Laboratory of Marine Drugs, Chinese Ministry of Education, Ocean University of China, Qingdao, P. R. China
| | - Chuanqin Shi
- Center of Translational Medicine, Zibo Central Hospital Affiliated to Binzhou Medical University, Zibo, China
| | - Gaokeng Xiao
- Guangzhou Molcalx Information & Technology ltd. Room 3406, F4, Build 3, Xiaozitiantang, Guangzhou, China
| | - Ximing Xu
- Marine Biomedical Research Institute of Qingdao, School of Medicine and Pharmacy, Key Laboratory of Marine Drugs, Chinese Ministry of Education, Ocean University of China, Qingdao, P. R. China
| |
Collapse
|
7
|
Hall BW, Tummino TA, Tang K, Irwin JJ, Shoichet BK. A database for large-scale docking and experimental results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.25.639879. [PMID: 40060496 PMCID: PMC11888352 DOI: 10.1101/2025.02.25.639879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
The rapid expansion of readily accessible compounds over the past six years has transformed molecular docking, improving hit rates and affinities. While many millions of molecules may score well in a docking campaign, the results are rarely fully shared, hindering the benchmarking of machine learning and chemical space exploration methods that seek to explore the expanding chemical spaces. To address this gap, we develop a website providing access to recent large library campaigns, including poses, scores, and in vitro results for campaigns against 11 targets, with 6.3 billion molecules docked and 3729 compounds experimentally tested. In a simple proof-of-concept study that speaks to the new library's utility, we use the new database to train machine learning models to predict docking scores and to find the top 0.01% scoring molecules while evaluating only 1% of the library. Even in these proof-of-concept studies, some interesting trends emerge: unsurprisingly, as models train on larger sets, they perform better; less expected, models could achieve high correlations with docking scores and yet still fail to enrich the new docking-discovered ligands, or even the top 0.01% of docking-ranked molecules. It will be interesting to see how these trends develop for methods more sophisticated than the simple proof-of-concept studies undertaken here; the database is openly available at lsd.docking.org.
Collapse
Affiliation(s)
- Brendan W. Hall
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Tia A. Tummino
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Khanh Tang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John J. Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Brian K. Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
8
|
Rizzi A, Mandelli D. High performance-oriented computer aided drug design approaches in the exascale era. Expert Opin Drug Discov 2025:1-10. [PMID: 39953911 DOI: 10.1080/17460441.2025.2468289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 01/25/2025] [Accepted: 02/13/2025] [Indexed: 02/17/2025]
Abstract
INTRODUCTION In 2023, the first exascale supercomputer was opened to the public in the US. With a demonstrated 1.1 exaflops of performance, Frontier represents an unprecedented breakthrough in high-performance computing (HPC). Currently, more (and more powerful) machines are being installed worldwide. Computer-aided drug design (CADD) is one of the fields of computational science that can greatly benefit from exascale computing for the benefit of the whole society. However, scaling CADD approaches to exploit exascale machines require new algorithmic and software solutions. AREAS COVERED Here, the authors consider physics-based and machine learning (ML)-aided techniques for the design of small molecule binders capable of leveraging modern parallel computer architectures. Specifically, the authors focus on HPC-oriented large-scale applications from the past 3 years that were enabled by (pre)exascale supercomputers by running on up tothousands of accelerated nodes. EXPERT OPINION In the area of ML, exascale computers can enable the training of generative models with unprecedented predictive power to design novel ligands, provided large amounts of high-quality data are available. Exascale computers could also unlock the potential of accurate ML-aided physics-based methods to boost the success rate of structure-based drug design campaigns. Currently, however, methodological developments are still required to allow routine large-scale applications of such rigorous approaches.
Collapse
Affiliation(s)
- Andrea Rizzi
- Computational Biomedicine (INM-9), Forschungszentrum Jülich Gmbh, Wilhelm-Johnen Straße, Jülich, Germany
- Atomistic Simulations, Italian Institute of Technology, via Morego, Genova, Italy
| | - Davide Mandelli
- Computational Biomedicine (INM-9), Forschungszentrum Jülich Gmbh, Wilhelm-Johnen Straße, Jülich, Germany
| |
Collapse
|
9
|
Shyamal SS. Computational exploration in search for novel natural product-derived EZH2 inhibitors for advancing anti-cancer therapy. Mol Divers 2025:10.1007/s11030-025-11128-3. [PMID: 39969739 DOI: 10.1007/s11030-025-11128-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Accepted: 02/04/2025] [Indexed: 02/20/2025]
Abstract
Epigenetic regulation intricately governs cellular mechanisms, including proliferation, death, differentiation, and cell cycle orchestration. One such target, Enhancer of zeste homolog 2 (EZH2), is essential for epigenetic regulation. EZH2 trimethylates histone H3 lys27 (H3K27me3), inhibiting target gene transcription and promoting chromatin condensation, thereby initiating tumorigenesis, thus a potentially plausible target to disrupt cancer progression. In this virtual screening study, we utilized two large, open-source natural product libraries, NPASS and LOTUS, to search for potential natural product scaffolds capable of EZH2 inhibition. The merged library was filtered through increasingly rigorous criteria at each stage, including Medchem-based rule filters, 2D Tanimoto similarity, sequential rounds of docking, rescoring via ML-based functions, and binding pose visualization, funneling down to the most promising candidates for further pharmacokinetics and toxicological profiles. The best hits were analyzed for their binding stability through molecular dynamics simulation and their binding free energy estimations. Exploratory chemical analysis was conducted to understand the similarity of hits with known EZH2 chemical space. This comprehensive workflow identified one potential inhibitor, LTS0131784, which exhibited favorable pharmacokinetic toxicity profiling with binding stability and free energy better than the FDA-approved EZH2 inhibitor, Tazemetostat. Furthermore, the plausible binding mechanism was also elucidated by analyzing the per residue-free decomposition of the simulated trajectories, which indicated the involvement of the LTS0131784 with the key residues TYR:111, TRP:521, CYS:560, ASN:585, and SER:561.
Collapse
Affiliation(s)
- Sagar Singh Shyamal
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (Banaras Hindu University), Varanasi, Uttar Pradesh, India.
| |
Collapse
|
10
|
Estevam GO, Linossi E, Rao J, Macdonald CB, Ravikumar A, Chrispens KM, Capra JA, Coyote-Maestas W, Pimentel H, Collisson EA, Jura N, Fraser JS. Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning. eLife 2025; 13:RP101882. [PMID: 39960754 PMCID: PMC11832172 DOI: 10.7554/elife.101882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025] Open
Abstract
Mutations in the kinase and juxtamembrane domains of the MET Receptor Tyrosine Kinase are responsible for oncogenesis in various cancers and can drive resistance to MET-directed treatments. Determining the most effective inhibitor for each mutational profile is a major challenge for MET-driven cancer treatment in precision medicine. Here, we used a deep mutational scan (DMS) of ~5764 MET kinase domain variants to profile the growth of each mutation against a panel of 11 inhibitors that are reported to target the MET kinase domain. We validate previously identified resistance mutations, pinpoint common resistance sites across type I, type II, and type I ½ inhibitors, unveil unique resistance and sensitizing mutations for each inhibitor, and verify non-cross-resistant sensitivities for type I and type II inhibitor pairs. We augment a protein language model with biophysical and chemical features to improve the predictive performance for inhibitor-treated datasets. Together, our study demonstrates a pooled experimental pipeline for identifying resistance mutations, provides a reference dictionary for mutations that are sensitized to specific therapies, and offers insights for future drug development.
Collapse
Affiliation(s)
- Gabriella O Estevam
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- Tetrad Graduate Program, University of California, San FranciscoSan FranciscoUnited States
| | - Edmond Linossi
- Cardiovascular Research Institute, University of California, San FranciscoSan FranciscoUnited States
- Department of Cellular and Molecular Pharmacology, University of California, San FranciscoSan FranciscoUnited States
| | - Jingyou Rao
- Department of Computer Science, University of California, Los AngelesLos AngelesUnited States
| | - Christian B Macdonald
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
| | - Ashraya Ravikumar
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
| | - Karson M Chrispens
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- Biophysics Graduate ProgramSan FranciscoUnited States
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San FranciscoSan FranciscoUnited States
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- Quantitative Biosciences Institute, University of California, San FranciscoSan FranciscoUnited States
| | - Harold Pimentel
- Department of Computer Science, University of California, Los AngelesLos AngelesUnited States
- Department of Computational Medicine and Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Eric A Collisson
- Human Biology, Fred Hutchinson Cancer CenterSeattleUnited States
- Department of Medicine, University of WashingtonSeattleUnited States
| | - Natalia Jura
- Cardiovascular Research Institute, University of California, San FranciscoSan FranciscoUnited States
- Department of Cellular and Molecular Pharmacology, University of California, San FranciscoSan FranciscoUnited States
- Quantitative Biosciences Institute, University of California, San FranciscoSan FranciscoUnited States
| | - James S Fraser
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- Quantitative Biosciences Institute, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
11
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
12
|
Nada H, Meanwell NA, Gabr MT. Virtual screening: hope, hype, and the fine line in between. Expert Opin Drug Discov 2025; 20:145-162. [PMID: 39862145 PMCID: PMC11844436 DOI: 10.1080/17460441.2025.2458666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/17/2025] [Accepted: 01/22/2025] [Indexed: 01/27/2025]
Abstract
INTRODUCTION Technological advancements in virtual screening (VS) have rapidly accelerated its application in drug discovery, as reflected by the exponential growth in VS-related publications. However, a significant gap remains between the volume of computational predictions and their experimental validation. This discrepancy has led to a rise in the number of unverified 'claimed' hits which impedes the drug discovery efforts. AREAS COVERED This perspective examines the current VS landscape, highlighting essential practices and identifying critical challenges, limitations, and common pitfalls. Using case studies and practices, this perspective aims to highlight strategies that can effectively mitigate or overcome these challenges. Furthermore, the perspective explores common approaches for addressing pharmacodynamic and pharmacokinetic issues in optimizing VS hits. EXPERT OPINION VS has become a tried-and-true technique of drug discovery due to the rapid advances in computational methods and machine learning (ML) over the past two decades. Although each VS workflow varies depending on the chosen approach and methodology, integrated strategies that combine biological and in silico data have consistently yielded higher success rates. Moreover, the widespread adoption of ML has enhanced the integration of VS into the drug discovery pipeline. However, the absence of standardized evaluation criteria hinders the objective assessment of VS studies' success and the identification of optimal adoption methods.
Collapse
Affiliation(s)
- Hossam Nada
- Department of Radiology, Molecular Imaging Innovations Institute (MI3), Weill Cornell Medicine, New York, NY 10065, USA
| | - Nicholas A. Meanwell
- Baruch S. Blumberg Institute, Doylestown, PA, USA; School of Pharmacy, University of Michigan, Ann Arbor, MI, USA
- Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA
| | - Moustafa T. Gabr
- Department of Radiology, Molecular Imaging Innovations Institute (MI3), Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
13
|
Paranthaman P, Veerappapillai S. Identification of putative Indoleamine 2,3-dioxygenase 1 (IDO1) and tryptophan 2,3-dioxygenase (TDO) dual inhibitors for triple-negative breast cancer therapy. J Biomol Struct Dyn 2025:1-19. [PMID: 39861977 DOI: 10.1080/07391102.2024.2332509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 03/13/2024] [Indexed: 01/27/2025]
Abstract
Tryptophan catabolism is a central pathway in many cancers, serving to sustain an immunosuppressive microenvironment. The key enzymes involved in this tryptophan metabolism such as indoleamine 2,3-dioxygenase 1 (IDO1) and tryptophan 2,3-dioxygenase (TDO) are reported as promising novel targets in cancer immunotherapy. IDO1 and TDO overexpression in TNBC cells promote resistance to cell death, proliferation, invasion, and metastasis. To date, there are no clinically available small-molecule inhibitors that target these enzymes. Navoximod, a reliable dual-specific inhibitor, resulted in poor bioavailability and modest efficacy in clinical trials restricts its utility. This situation urges the development of a potent drug-like candidate against these key enzymes. A total of 1574 natural compounds were proclaimed and subjected to ADME screening. Subsequently, the resultant compounds were attributed to hierarchical molecular docking and MM-GBSA validation. Ultimately, re-scoring with the aid of combined machine learning algorithms resulted six lead compounds. Captivatingly, NPACT00380 exhibited maximum interaction among the lead compounds. In addition, the scaffold analysis also highlighted that the chromanone moiety of the hit compound boasts anti-cancer activity against breast cancer cell lines. The reliability of the results was corroborated through a rigorous 100 ns molecular dynamics simulation using the parameters including RMSD, PCA and FEL analysis. In light of these findings, it is presumed that the proposed compound exhibits significant inhibitory activity. As a result, we speculate that further optimisation of NPACT00380 could be beneficial for the treatment and management of TNBC.
Collapse
Affiliation(s)
- Priyanga Paranthaman
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| |
Collapse
|
14
|
Vega A, Planas A, Biarnés X. A Practical Guide to Computational Tools for Engineering Biocatalytic Properties. Int J Mol Sci 2025; 26:980. [PMID: 39940748 PMCID: PMC11817184 DOI: 10.3390/ijms26030980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 01/20/2025] [Accepted: 01/21/2025] [Indexed: 02/16/2025] Open
Abstract
The growing demand for efficient, selective, and stable enzymes has fueled advancements in computational enzyme engineering, a field that complements experimental methods to accelerate enzyme discovery. With a plethora of software and tools available, researchers from different disciplines often face challenges in selecting the most suitable method that meets their requirements and available starting data. This review categorizes the computational tools available for enzyme engineering based on their capacity to enhance the following specific biocatalytic properties of biotechnological interest: (i) protein-ligand affinity/selectivity, (ii) catalytic efficiency, (iii) thermostability, and (iv) solubility for recombinant enzyme production. By aligning tools with their respective scoring functions, we aim to guide researchers, particularly those new to computational methods, in selecting the appropriate software for the design of protein engineering campaigns. De novo enzyme design, involving the creation of novel proteins, is beyond this review's scope. Instead, we focus on practical strategies for fine-tuning enzymatic performance within an established reference framework of natural proteins.
Collapse
Affiliation(s)
- Aitor Vega
- Laboratory of Biochemistry, Institut Químic de Sarrià, Universitat Ramon Llull, Via Augusta 390, 08017 Barcelona, Spain;
| | - Antoni Planas
- Laboratory of Biochemistry, Institut Químic de Sarrià, Universitat Ramon Llull, Via Augusta 390, 08017 Barcelona, Spain;
- Royal Academy of Sciences and Arts of Barcelona, 08002 Barcelona, Spain
| | - Xevi Biarnés
- Laboratory of Biochemistry, Institut Químic de Sarrià, Universitat Ramon Llull, Via Augusta 390, 08017 Barcelona, Spain;
| |
Collapse
|
15
|
Zhang X, Zhang M, Li Y, Deng P. Identification of Potential Selective PAK4 Inhibitors Through Shape and Protein Conformation Ensemble Screening and Electrostatic-Surface-Matching Optimization. Curr Issues Mol Biol 2025; 47:29. [PMID: 39852144 PMCID: PMC11764389 DOI: 10.3390/cimb47010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Revised: 12/31/2024] [Accepted: 01/03/2025] [Indexed: 01/26/2025] Open
Abstract
P21-activated kinase 4 (PAK4) plays a crucial role in the proliferation and metastasis of various cancers. However, developing selective PAK4 inhibitors remains challenging due to the high homology within the PAK family. Therefore, developing highly selective PAK4 inhibitors is critical to overcoming the limitations of existing inhibitors. We analyzed the structural differences in the binding pockets of PAK1 and PAK4 by combining cross-docking and molecular dynamics simulations to identify key binding regions and unique structural features of PAK4. We then performed screening using shape and protein conformation ensembles, followed by a re-evaluation of the docking results with deep-learning-driven GNINA to identify the candidate molecule, STOCK7S-56165. Based on this, we applied a fragment-replacement strategy under electrostatic-surface-matching conditions to obtain Compd 26. This optimization significantly improved electrostatic interactions and reduced binding energy, highlighting its potential for selectivity. Our findings provide a novel approach for developing selective PAK4 inhibitors and lay the theoretical foundation for future anticancer drug design.
Collapse
Affiliation(s)
- Xiaoxuan Zhang
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
| | - Meile Zhang
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
| | - Yihao Li
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
| | - Ping Deng
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
- Chongqing Key Research Laboratory for Quality Evaluation and Safety Research of APIs, Chongqing 400016, China
| |
Collapse
|
16
|
Gómez-Sacristán P, Simeon S, Tran-Nguyen VK, Patil S, Ballester PJ. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J Adv Res 2025; 67:185-196. [PMID: 38280715 PMCID: PMC11725107 DOI: 10.1016/j.jare.2024.01.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024] Open
Abstract
INTRODUCTION Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
Collapse
Affiliation(s)
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Sachin Patil
- NanoBio Laboratory, Widener University, Chester, PA 19013, USA
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
17
|
Schifferstein J, Bernatavicius A, Janssen APA. Docking-Informed Machine Learning for Kinome-wide Affinity Prediction. J Chem Inf Model 2024; 64:9196-9204. [PMID: 39657274 DOI: 10.1021/acs.jcim.4c01260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Kinase inhibitors are an important class of anticancer drugs, with 80 inhibitors clinically approved and >100 in active clinical testing. Most bind competitively in the ATP-binding site, leading to challenges with selectivity for a specific kinase, resulting in risks for toxicity and general off-target effects. Assessing the binding of an inhibitor for the entire kinome is experimentally possible but expensive. A reliable and interpretable computational prediction of kinase selectivity would greatly benefit the inhibitor discovery and optimization process. Here, we use machine learning on docked poses to address this need. To this end, we aggregated all known inhibitor-kinase affinities and generated the complete accompanying 3D interactome by docking all inhibitors to the respective high-quality X-ray structures. We then used this resource to train a neural network as a kinase-specific scoring function, which achieved an overall performance (R2) of 0.63-0.74 on unseen inhibitors across the kinome. The entire pipeline from molecule to 3D-based affinity prediction has been fully automated and wrapped in a freely available package. This has a graphical user interface that is tightly integrated with PyMOL to allow immediate adoption in the medicinal chemistry practice.
Collapse
Affiliation(s)
- Jordy Schifferstein
- Department of Molecular Physiology, Leiden Institute of Chemistry, Leiden University, Leiden 2333CC, The Netherlands
- Oncode Institute, Utrecht 3521AL, The Netherlands
| | - Andrius Bernatavicius
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333CC, The Netherlands
| | - Antonius P A Janssen
- Department of Molecular Physiology, Leiden Institute of Chemistry, Leiden University, Leiden 2333CC, The Netherlands
- Oncode Institute, Utrecht 3521AL, The Netherlands
| |
Collapse
|
18
|
Lee HJ, Emani PS, Gerstein MB. Improved Prediction of Ligand-Protein Binding Affinities by Meta-modeling. J Chem Inf Model 2024; 64:8684-8704. [PMID: 39576762 PMCID: PMC11632770 DOI: 10.1021/acs.jcim.4c01116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 10/21/2024] [Accepted: 10/28/2024] [Indexed: 11/24/2024]
Abstract
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.
Collapse
Affiliation(s)
- Ho-Joon Lee
- Department
of Genetics and Yale Center for Genome Analysis, Yale University, New Haven, Connecticut 06510, United States
| | - Prashant S. Emani
- Department
of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, United States
| | - Mark B. Gerstein
- Department
of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, United States
- Program
in Computational Biology & Bioinformatics, Department of Computer
Science, Department
of Statistics & Data Science, and Department of Biomedical Informatics
& Data Science, Yale University, New Haven, Connecticut 06520, United States
| |
Collapse
|
19
|
Fasoulis R, Paliouras G, Kavraki LE. RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models. J Chem Inf Model 2024; 64:8729-8742. [PMID: 39555889 PMCID: PMC11633655 DOI: 10.1021/acs.jcim.4c01278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 10/16/2024] [Accepted: 11/07/2024] [Indexed: 11/19/2024]
Abstract
The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.
Collapse
Affiliation(s)
- Romanos Fasoulis
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
| | - Georgios Paliouras
- Institute
of Informatics and Telecommunications, NCSR
Demokritos, Athens 15341, Greece
| | - Lydia E. Kavraki
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
- Ken
Kennedy Institute, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
20
|
Estevam GO, Linossi EM, Rao J, Macdonald CB, Ravikumar A, Chrispens KM, Capra JA, Coyote-Maestas W, Pimentel H, Collisson EA, Jura N, Fraser JS. Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.16.603579. [PMID: 39071407 PMCID: PMC11275805 DOI: 10.1101/2024.07.16.603579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Mutations in the kinase and juxtamembrane domains of the MET Receptor Tyrosine Kinase are responsible for oncogenesis in various cancers and can drive resistance to MET-directed treatments. Determining the most effective inhibitor for each mutational profile is a major challenge for MET-driven cancer treatment in precision medicine. Here, we used a deep mutational scan (DMS) of ~5,764 MET kinase domain variants to profile the growth of each mutation against a panel of 11 inhibitors that are reported to target the MET kinase domain. We validate previously identified resistance mutations, pinpoint common resistance sites across type I, type II, and type I ½ inhibitors, unveil unique resistance and sensitizing mutations for each inhibitor, and verify non-cross-resistant sensitivities for type I and type II inhibitor pairs. We augment a protein language model with biophysical and chemical features to improve the predictive performance for inhibitor-treated datasets. Together, our study demonstrates a pooled experimental pipeline for identifying resistance mutations, provides a reference dictionary for mutations that are sensitized to specific therapies, and offers insights for future drug development.
Collapse
Affiliation(s)
- Gabriella O. Estevam
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Tetrad Graduate Program, UCSF, San Francisco, CA, United States
| | - Edmond M. Linossi
- Cardiovascular Research Institute, UCSF, San Francisco, CA, United States
- Department of Cellular and Molecular Pharmacology, UCSF, San Francisco, CA, United States
| | - Jingyou Rao
- Department of Computer Science, UCLA, Los Angeles, CA, United States
| | - Christian B. Macdonald
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
| | - Ashraya Ravikumar
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
| | - Karson M. Chrispens
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Biophysics Graduate Program, UCSF, San Francisco, CA, United States
| | - John A. Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, UCSF, San Francisco, CA, United States
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, United States
| | - Harold Pimentel
- Department of Computer Science, UCLA, Los Angeles, CA, United States
- Department of Computational Medicine and Human Genetics, UCLA, Los Angeles, CA, United States
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, United States
| | - Eric A. Collisson
- Human Biology, Fred Hutchinson Cancer Center, Seattle, Washington, United States
- Department of Medicine, University of Washington, Seattle, Washington, United States
| | - Natalia Jura
- Cardiovascular Research Institute, UCSF, San Francisco, CA, United States
- Department of Cellular and Molecular Pharmacology, UCSF, San Francisco, CA, United States
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, United States
| | - James S. Fraser
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, United States
| |
Collapse
|
21
|
Nabi T, Riyed TH, Ornob A. Deep learning based predictive modeling to screen natural compounds against TNF-alpha for the potential management of rheumatoid arthritis: Virtual screening to comprehensive in silico investigation. PLoS One 2024; 19:e0303954. [PMID: 39636801 PMCID: PMC11620472 DOI: 10.1371/journal.pone.0303954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 10/02/2024] [Indexed: 12/07/2024] Open
Abstract
Rheumatoid arthritis (RA) affects an estimated 0.1% to 2.0% of the world's population, leading to a substantial impact on global health. The adverse effects and toxicity associated with conventional RA treatment pathways underscore the critical need to seek potential new therapeutic candidates, particularly those of natural sources that can treat the condition with minimal side effects. To address this challenge, this study employed a deep-learning (DL) based approach to conduct a virtual assessment of natural compounds against the Tumor Necrosis Factor-alpha (TNF-α) protein. TNF-α stands out as the primary pro-inflammatory cytokine, crucial in the development of RA. Our predictive model demonstrated appreciable performance, achieving MSE of 0.6, MAPE of 10%, and MAE of 0.5. The model was then deployed to screen a comprehensive set of 2563 natural compounds obtained from the Selleckchem database. Utilizing their predicted bioactivity (pIC50), the top 128 compounds were identified. Among them, 68 compounds were taken for further analysis based on drug-likeness analysis. Subsequently, selected compounds underwent additional evaluation using molecular docking (< - 8.7 kcal/mol) and ADMET resulting in four compounds posing nominal toxicity, which were finally subjected to MD simulation for 200 ns. Later on, the stability of complexes was assessed via analysis encompassing RMSD, RMSF, Rg, H-Bonds, SASA, and Essential Dynamics. Ultimately, based on the total binding free energy estimated using the MM/GBSA method, Imperialine, Veratramine, and Gelsemine are proven to be potential natural inhibitors of TNF-α.
Collapse
Affiliation(s)
- Tasnia Nabi
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka, Bangladesh
| | - Tanver Hasan Riyed
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka, Bangladesh
| | - Akid Ornob
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka, Bangladesh
| |
Collapse
|
22
|
Fusti-Molnar L. Integrating Quantum Mechanics into Protein-Ligand Docking: Toward Higher Accuracy and Reliability. RESEARCH SQUARE 2024:rs.3.rs-5433993. [PMID: 39678339 PMCID: PMC11643324 DOI: 10.21203/rs.3.rs-5433993/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
I introduce two new methods, QFVina and QFVinardo, for protein-ligand docking that leverage precomputed high-quality conformational libraries with QM-optimized geometries and ab initio DFT-D4-based conformational rankings and strain energies. These methods provide greater accuracy in docking-based virtual screening by addressing the inaccuracies in intramolecular relative energies of conformations, a critical component often misrepresented in flexible ligand docking calculations. I demonstrate that numerous force field-based methods widely used today exhibit substantial errors in conformational relative energies, and that it is unrealistic to expect better accuracy from the faster scoring functions typically employed in docking. Consistent with these findings, I show that traditional flexible ligand docking often produces geometries with significant strain energies and large deviations, with magnitudes comparable to the protein-ligand binding energies themselves and much larger than the differences we aim to estimate in docking hitlists. By using physically realistic ligand conformations with accurate strain energies in the scoring function, QFVina and QFVinardo produce markedly different docking results, even with the same docking parameters and scoring functions for protein-ligand interaction energies. I analyzed these differences in docking hitlists and selected protein-ligand interactions using three protein targets from COVID-19 research.
Collapse
|
23
|
Bagdad Y, Miteva MA. Recent Applications of Artificial Intelligence in Discovery of New Antibacterial Agents. Adv Appl Bioinform Chem 2024; 17:139-157. [PMID: 39650228 PMCID: PMC11624680 DOI: 10.2147/aabc.s484321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/25/2024] [Indexed: 12/11/2024] Open
Abstract
Antimicrobial resistance (AMR) represents today a major challenge for global public health, compromising the effectiveness of treatments against a multitude of bacterial infections. In recent decades, artificial intelligence (AI) has emerged as a promising technology for the identification and development of new antibacterial agents. This review focuses on AI methodologies applied to discover new antibacterial candidates. Case studies that identified small molecules and peptides showing antimicrobial activity and demonstrating efficiency against pathogenic resistant bacteria by employing AI are summarized. We also discuss the challenges and opportunities offered by AI, highlighting the importance of AI progress for the identification of new promising antibacterial drug candidates to combat the AMR.
Collapse
Affiliation(s)
- Youcef Bagdad
- Université Paris Cité, CNRS UMR 8038 CiTCoM, Inserm U1268 MCTR, Paris, France
| | - Maria A Miteva
- Université Paris Cité, CNRS UMR 8038 CiTCoM, Inserm U1268 MCTR, Paris, France
| |
Collapse
|
24
|
Yang Z, Zhong W, Lv Q, Dong T, Chen G, Chen CYC. Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions From 3D Structures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:8191-8208. [PMID: 38739515 DOI: 10.1109/tpami.2024.3400515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.
Collapse
|
25
|
Junaid M, Wang B, Li W. Data-augmented machine learning scoring functions for virtual screening of YTHDF1 m 6A reader protein. Comput Biol Med 2024; 183:109268. [PMID: 39405731 DOI: 10.1016/j.compbiomed.2024.109268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 11/20/2024]
Abstract
Machine learning is rapidly advancing the drug discovery process, significantly enhancing speed and efficiency. Innovation in computer-aided drug design is primarily driven by structure- and ligand-based approaches. When the number of known inhibitors for a target is limited, data augmentation strategies are often preferred to enhance model performance. In this study, we developed predictive machine learning models for structure-based drug discovery leveraging multiple traditional machine learning algorithms trained with target and ligand dynamics-aware datasets. To illustrate our approach, we present a composite model that combines classification and regression to predict YTHDF1 inhibitors, utilizing PLEC features. YTHDF1, a key m6A reader protein involved in mRNA translation, is implicated in various cancers, making it a promising therapeutic target. Traditional structure-based virtual screening (SBVS) using generic scoring functions has struggled to identify potent YTHDF1 inhibitors due to the protein's unique binding characteristics. To overcome this, we developed YTHDF1-specific machine learning scoring functions (MLSFs) to enhance SBVS efficacy. We employed various data augmentation techniques to generate a comprehensive dataset, incorporating multiple conformations of ligands and the YTHDF1 protein. We have trained 64 YTHDF1-specific MLSFs using four machine learning algorithms and evaluated them on ten test sets, focusing on their predictive and ranking power. Our results demonstrate that the artificial neural network with protein-ligand extended connectivity fingerprints (ANN-PLEC) outperforms other MLSFs, consistently achieving high area under the precision-recall curve (PR-AUC) of 0.87. This method shows promise for targets with limited quantities of active molecules, providing a viable path forward for drug discovery research. The ANN-PLEC scoring function is made freely available on GitHub for other researchers to access and utilize https://github.com/JuniML/SBVS-YTHDF1/.
Collapse
Affiliation(s)
- Muhammad Junaid
- Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China; College of Physics and Optoelectronics Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Bo Wang
- Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China
| | - Wenjin Li
- Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China.
| |
Collapse
|
26
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
27
|
Hu Q, Wang Z, Meng J, Li W, Guo J, Mu Y, Wang S, Zheng L, Wei Y. OpenDock: a pytorch-based open-source framework for protein-ligand docking and modelling. Bioinformatics 2024; 40:btae628. [PMID: 39432683 PMCID: PMC11552628 DOI: 10.1093/bioinformatics/btae628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/19/2024] [Accepted: 10/19/2024] [Indexed: 10/23/2024] Open
Abstract
MOTIVATION Molecular docking is an invaluable computational tool with broad applications in computer-aided drug design and enzyme engineering. However, current molecular docking tools are typically implemented in languages such as C++ for calculation speed, which lack flexibility and user-friendliness for further development. Moreover, validating the effectiveness of external scoring functions for molecular docking and screening within these frameworks is challenging, and implementing more efficient sampling strategies is not straightforward. RESULTS To address these limitations, we have developed an open-source molecular docking framework, OpenDock, based on Python and PyTorch. This framework supports the integration of multiple scoring functions; some can be utilized during molecular docking and pose optimization, while others can be used for post-processing scoring. In terms of sampling, the current version of this framework supports simulated annealing and Monte Carlo optimization. Additionally, it can be extended to include methods such as genetic algorithms and particle swarm optimization for sampling docking poses and protein side chain orientations. Distance constraints are also implemented to enable covalent docking, restricted docking or distance map constraints guided pose sampling. Overall, this framework serves as a valuable tool in drug design and enzyme engineering, offering significant flexibility for most protein-ligand modelling tasks. AVAILABILITY AND IMPLEMENTATION OpenDock is publicly available at: https://github.com/guyuehuo/opendock.
Collapse
Affiliation(s)
- Qiuyue Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zechen Wang
- School of Physics, Shangdong University, Jinan, 250100, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| | - Weifeng Li
- School of Physics, Shangdong University, Jinan, 250100, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, 999078, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Sheng Wang
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201203, China
| | | | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| |
Collapse
|
28
|
Ghislat G, Hernandez-Hernandez S, Piyawajanusorn C, Ballester PJ. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin Drug Discov 2024; 19:1297-1307. [PMID: 39316009 DOI: 10.1080/17460441.2024.2403639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/25/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models. AREAS COVERED In this perspective, the authors discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models, and which issue-specific mitigations have been effective. Next, they point out the challenges faced by uncertainty quantification techniques aimed at enhancing and trusting the predictions from these AI models. They also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, the authors explain how human bias, whether from AI experts or drug discovery experts, constitutes another challenge that can be alleviated by gaining more prospective experience. EXPERT OPINION AI models are often developed to excel on retrospective benchmarks unlikely to anticipate their prospective performance. As a result, only a few of these models are ever reported to have prospective value (e.g. by discovering potent and innovative drug leads for a therapeutic target). The authors have discussed what can go wrong in practice with AI for drug discovery. The authors hope that this will help inform the decisions of editors, funders investors, and researchers working in this area.
Collapse
Affiliation(s)
- Ghita Ghislat
- Department of Life Sciences, Imperial College London, London, UK
| | | | | | | |
Collapse
|
29
|
Ma Z, Ajibade A, Zou X. Docking strategies for predicting protein-ligand interactions and their application to structure-based drug design. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2024; 24:199-230. [PMID: 39584017 PMCID: PMC11583305 DOI: 10.4310/cis.241021221101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2024]
Abstract
Molecular docking stands as a pivotal element in the realm of computer-aided drug design (CADD), consistently contributing to advancements in pharmaceutical research. In essence, it employs computer algorithms to identify the "best" match between two molecules, akin to solving intricate three-dimensional jigsaw puzzles. At a more stringent level, the molecular docking challenge entails predicting the accurate bound association state based on the atomic coordinates of two molecules. This process assumes particular significance in unraveling the mechanistic intricacies of physicochemical interactions at the atomic scale. Notably, the application of docking, especially in the context of protein-small molecule interactions, holds wide-ranging implications for structure-based drug design, given the prevalent use of small compounds as drug candidates. This study provides an overview of docking methodologies, delves into recent key developments, elucidates the physicochemical underpinnings of molecular recognition in protein-ligand interactions, and concludes by addressing the applications of docking in virtual screening, alongside current challenges within existing docking methods.
Collapse
Affiliation(s)
- Zhiwei Ma
- Dalton Cardiovascular Research Center, University of Missouri-Columbia USA
| | - Abeeb Ajibade
- Dalton Cardiovascular Research Center, University of Missouri-Columbia
- Department of Physics and Astronomy, University of Missouri-Columbia USA
| | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, University of Missouri-Columbia
- Department of Physics and Astronomy, University of Missouri-Columbia
- Department of Biochemistry, University of Missouri-Columbia
- Institute for Data Science and Informatics, University of Missouri-Columbia USA
| |
Collapse
|
30
|
Hall B, Keiser MJ. Retrieval Augmented Docking Using Hierarchical Navigable Small Worlds. J Chem Inf Model 2024; 64:7398-7408. [PMID: 39360680 PMCID: PMC11480973 DOI: 10.1021/acs.jcim.4c00683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 09/17/2024] [Accepted: 09/18/2024] [Indexed: 10/04/2024]
Abstract
Make-on-demand chemical libraries have drastically increased the reach of molecular docking, with the enumerated ready-to-dock ZINC-22 library approaching 6.4 billion molecules (July 2024). While ever-growing libraries result in better-scoring molecules, the computational resources required to dock all of ZINC-22 make this endeavor infeasible for most. Here, we organize and traverse chemical space with hierarchical navigable small-world graphs, a method we term retrieval augmented docking (RAD). RAD recovers most virtual actives, despite docking only a fraction of the library. Furthermore, RAD is protein-agnostic, supporting additional docking campaigns without additional computational overhead. In depth, we assess RAD on published large-scale docking campaigns against D4 and AmpC spanning 99.5 million and 138 million molecules, respectively. RAD recovers 95% of DOCK virtual actives for both targets after evaluating only 10% of the libraries. In breadth, RAD shows widespread applicability against 43 DUDE-Z proteins, evaluating 50.3 million associations. On average, RAD recovers 87% of virtual actives while docking 10% of the library without sacrificing chemical diversity.
Collapse
Affiliation(s)
- Brendan
W. Hall
- Department
of Pharmaceutical Chemistry, University
of California, San Francisco, San Francisco, California 94158, United States
- Program
in Biophysics, University of California,
San Francisco, San Francisco, California 94158, United States
| | - Michael J. Keiser
- Department
of Pharmaceutical Chemistry, University
of California, San Francisco, San Francisco, California 94158, United States
- Institute
for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, California 94158, United States
- Bakar
Computational Health Sciences Institute, University of California,
San Francisco, San Francisco, California 94158, United States
- Department
of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
31
|
Muegge I, Bentzien J, Ge Y. Perspectives on current approaches to virtual screening in drug discovery. Expert Opin Drug Discov 2024; 19:1173-1183. [PMID: 39132881 DOI: 10.1080/17460441.2024.2390511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Accepted: 08/06/2024] [Indexed: 08/13/2024]
Abstract
INTRODUCTION For the past two decades, virtual screening (VS) has been an efficient hit finding approach for drug discovery. Today, billions of commercially accessible compounds are routinely screened, and many successful examples of VS have been reported. VS methods continue to evolve, including machine learning and physics-based methods. AREAS COVERED The authors examine recent examples of VS in drug discovery and discuss prospective hit finding results from the critical assessment of computational hit-finding experiments (CACHE) challenge. The authors also highlight the cost considerations and open-source options for conducting VS and examine chemical space coverage and library selections for VS. EXPERT OPINION The advancement of sophisticated VS approaches, including the use of machine learning techniques and increased computer resources as well as the ease of access to synthetically available chemical spaces, and commercial and open-source VS platforms allow for interrogating ultra-large libraries (ULL) of billions of molecules. An impressive number of prospective ULL VS campaigns have generated potent and structurally novel hits across many target classes. Nonetheless, many successful contemporary VS approaches still use considerably smaller focused libraries. This apparent dichotomy illustrates that VS is best conducted in a fit-for-purpose way choosing an appropriate chemical space. Better methods need to be developed to tackle more challenging targets.
Collapse
Affiliation(s)
- Ingo Muegge
- Research department, Alkermes, Inc, Waltham, MA, USA
| | - Jörg Bentzien
- Research department, Alkermes, Inc, Waltham, MA, USA
| | - Yunhui Ge
- Research department, Alkermes, Inc, Waltham, MA, USA
| |
Collapse
|
32
|
Lam HYI, Guan JS, Ong XE, Pincket R, Mu Y. Protein language models are performant in structure-free virtual screening. Brief Bioinform 2024; 25:bbae480. [PMID: 39327890 PMCID: PMC11427677 DOI: 10.1093/bib/bbae480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/17/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Hitherto virtual screening (VS) has been typically performed using a structure-based drug design paradigm. Such methods typically require the use of molecular docking on high-resolution three-dimensional structures of a target protein-a computationally-intensive and time-consuming exercise. This work demonstrates that by employing protein language models and molecular graphs as inputs to a novel graph-to-transformer cross-attention mechanism, a screening power comparable to state-of-the-art structure-based models can be achieved. The implications thereof include highly expedited VS due to the greatly reduced compute required to run this model, and the ability to perform early stages of computer-aided drug design in the complete absence of 3D protein structures.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Jia Sheng Guan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
| | - Xing Er Ong
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Robbe Pincket
- Heliovision, Asstraat 5, 3000 Leuven, Leuven, Kingdom of Belgium
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| |
Collapse
|
33
|
Prat A, Abdel Aty H, Bastas O, Kamuntavičius G, Paquet T, Norvaišas P, Gasparotto P, Tal R. HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery. J Chem Inf Model 2024; 64:5817-5831. [PMID: 39037942 DOI: 10.1021/acs.jcim.4c00481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
We propose HydraScreen, a deep-learning framework for safe and robust accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network designed for the effective representation of molecular structures and interactions in protein-ligand binding. We designed an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assessed our approach using established public benchmarks based on the CASF-2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). We introduced a novel approach for interaction profiling, aimed at detecting potential biases within both the model and data sets. This approach not only enhanced interpretability but also reinforced the impartiality of our methodology. Finally, we demonstrated HydraScreen's ability to generalize effectively across novel proteins and ligands through a temporal split. We also provide insights into potential avenues for future development aimed at enhancing the robustness of machine learning scoring functions. HydraScreen (accessible at http://hydrascreen.ro5.ai/paper) provides a user-friendly GUI and a public API, facilitating the easy-access assessment of protein-ligand complexes.
Collapse
Affiliation(s)
- Alvaro Prat
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Hisham Abdel Aty
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Orestis Bastas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | | | - Tanya Paquet
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Povilas Norvaišas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Piero Gasparotto
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Roy Tal
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| |
Collapse
|
34
|
Shafiq M, Sherwani ZA, Mushtaq M, Nur-E-Alam M, Ahmad A, Ul-Haq Z. A deep learning-based theoretical protocol to identify potentially isoform-selective PI3Kα inhibitors. Mol Divers 2024; 28:1907-1924. [PMID: 38305819 DOI: 10.1007/s11030-023-10799-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/22/2023] [Indexed: 02/03/2024]
Abstract
Phosphoinositide 3-kinase alpha (PI3Kα) is one of the most frequently dysregulated kinases known for their pivotal role in many oncogenic diseases. While the side effects linked to existing drugs against PI3Kα-induced cancers provide an avenue for further research, the significant structural conservation among PI3Ks makes it extremely difficult to develop new isoform-selective PI3Kα inhibitors. Embracing this challenge, we herein designed a hybrid protocol by integrating machine learning (ML) with in silico drug-designing strategies. A deep learning classification model was developed and trained on the physicochemical descriptors data of known PI3Kα inhibitors and used as a screening filter for a database of small molecules. This approach led us to the prediction of 662 compounds showcasing appropriate features to be considered as PI3Kα inhibitors. Subsequently, a multiphase molecular docking was applied to further characterize the predicted hits in terms of their binding affinities and binding modes in the targeted cavity of the PI3Kα. As a result, a total of 12 compounds were identified whereas the best poses highlighted the efficiency of these ligands in maintaining interactions with the crucial residues of the protein to be targeted for the inhibition of associated activity. Notably, potential activity of compound 12 in counteracting PI3Kα function was found in a previous in vitro study. Following the drug-likeness and pharmacokinetic characterizations, six compounds (compounds 1, 2, 3, 6, 7, and 11) with suitable ADME-T profiles and promising bioavailability were selected. The mechanistic studies in dynamic mode further endorsed the potential of identified hits in blocking the ATP-binding site of the receptor with higher binding affinities than the native inhibitor, alpelisib (BYL-719), particularly the compounds 1, 2, and 11. These outcomes support the reliability of the developed classification model and the devised computational strategy for identifying new isoform-selective drug candidates for PI3Kα inhibition.
Collapse
Affiliation(s)
- Muhammad Shafiq
- H.E.J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Zaid Anis Sherwani
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Mamona Mushtaq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Mohammad Nur-E-Alam
- Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. Box. 2457, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Aftab Ahmad
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, 92618, USA
| | - Zaheer Ul-Haq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan.
| |
Collapse
|
35
|
Duo L, Liu Y, Ren J, Tang B, Hirst JD. Artificial intelligence for small molecule anticancer drug discovery. Expert Opin Drug Discov 2024; 19:933-948. [PMID: 39074493 DOI: 10.1080/17460441.2024.2367014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 06/07/2024] [Indexed: 07/31/2024]
Abstract
INTRODUCTION The transition from conventional cytotoxic chemotherapy to targeted cancer therapy with small-molecule anticancer drugs has enhanced treatment outcomes. This approach, which now dominates cancer treatment, has its advantages. Despite the regulatory approval of several targeted molecules for clinical use, challenges such as low response rates and drug resistance still persist. Conventional drug discovery methods are costly and time-consuming, necessitating more efficient approaches. The rise of artificial intelligence (AI) and access to large-scale datasets have revolutionized the field of small-molecule cancer drug discovery. Machine learning (ML), particularly deep learning (DL) techniques, enables the rapid identification and development of novel anticancer agents by analyzing vast amounts of genomic, proteomic, and imaging data to uncover hidden patterns and relationships. AREA COVERED In this review, the authors explore the important landmarks in the history of AI-driven drug discovery. They also highlight various applications in small-molecule cancer drug discovery, outline the challenges faced, and provide insights for future research. EXPERT OPINION The advent of big data has allowed AI to penetrate and enable innovations in almost every stage of medicine discovery, transforming the landscape of oncology research through the development of state-of-the-art algorithms and models. Despite challenges in data quality, model interpretability, and technical limitations, advancements promise breakthroughs in personalized and precision oncology, revolutionizing future cancer management.
Collapse
Affiliation(s)
- Lihui Duo
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Yu Liu
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Jianfeng Ren
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Bencan Tang
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Jonathan D Hirst
- School of Chemistry, University of Nottingham University Park, Nottingham, UK
| |
Collapse
|
36
|
Zhou H, Skolnick J. Utility of the Morgan Fingerprint in Structure-Based Virtual Ligand Screening. J Phys Chem B 2024; 128:5363-5370. [PMID: 38783525 PMCID: PMC11163432 DOI: 10.1021/acs.jpcb.4c01875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
In modern drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing and refinement due to its cost-effective nature for large compound libraries. For decades, efforts have been devoted to developing VLS methods with high accuracy. These include the state-of-the-art FINDSITE suite of approaches FINDSITEcomb2.0, FRAGSITE, and FRAGSITE2 and the meta version FRAGSITEcomb that were developed in our lab. These methods combine ligand homology modeling (LHM), traditional ligand similarity methods, and more recently machine learning approaches to rank ligands and have proven to be superior to most recent deep learning and large language model-based approaches. Here, we describe further improvements to our previous best methods by combining the Morgan fingerprint (MF) with the originally used PubChem fingerprint and FP2 fingerprint. We then benchmarked FINDSITEcomb2.0M, FRAGSITEM, FRAGSITE2M, and the composite meta-approach FRAGSITEcombM. On the 102 target DUD-E set, the 1% enrichment factor (EF1%) and area under the precision-recall curve (AUPR) of FRAGSITEcomb increased from 42.0/0.59 to 47.6/0.72. This 0.72 AUPR is significantly better than that of the state-of-the-art deep learning-based method DenseFS's AUPR of 0.443. An independent test on the 81 targets DEKOIS2.0 set shows that EF1%/AUPR increases from 18.3/0.520 to 23.1/0.683. An ablation investigation shows that the MF contributes to most of the improvement of all four approaches. Thus, the MF is a useful addition to structure-based VLS.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
37
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
38
|
Zhou Y, Chen SJ. Advances in machine-learning approaches to RNA-targeted drug design. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100053. [PMID: 38434217 PMCID: PMC10904028 DOI: 10.1016/j.aichem.2024.100053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
RNA molecules play multifaceted functional and regulatory roles within cells and have garnered significant attention in recent years as promising therapeutic targets. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in computer-aided drug design (CADD) to discover novel drug compounds that target RNA. Although machine-learning (ML) approaches have been widely adopted in the discovery of small molecules targeting proteins, the application of ML approaches to model interactions between RNA and small molecule is still in its infancy. Compared to protein-targeted drug discovery, the major challenges in ML-based RNA-targeted drug discovery stem from the scarcity of available data resources. With the growing interest and the development of curated databases focusing on interactions between RNA and small molecule, the field anticipates a rapid growth and the opening of a new avenue for disease treatment. In this review, we aim to provide an overview of recent advancements in computationally modeling RNA-small molecule interactions within the context of RNA-targeted drug discovery, with a particular emphasis on methodologies employing ML techniques.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
39
|
King HR, Bycroft M, Nguyen TB, Kelly G, Vinogradov AA, Rowling PJE, Stott K, Ascher DB, Suga H, Itzhaki LS, Artavanis-Tsakonas K. Targeting the Plasmodium falciparum UCHL3 ubiquitin hydrolase using chemically constrained peptides. Proc Natl Acad Sci U S A 2024; 121:e2322923121. [PMID: 38739798 PMCID: PMC11126973 DOI: 10.1073/pnas.2322923121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 03/18/2024] [Indexed: 05/16/2024] Open
Abstract
The ubiquitin-proteasome system is essential to all eukaryotes and has been shown to be critical to parasite survival as well, including Plasmodium falciparum, the causative agent of the deadliest form of malarial disease. Despite the central role of the ubiquitin-proteasome pathway to parasite viability across its entire life-cycle, specific inhibitors targeting the individual enzymes mediating ubiquitin attachment and removal do not currently exist. The ability to disrupt P. falciparum growth at multiple developmental stages is particularly attractive as this could potentially prevent both disease pathology, caused by asexually dividing parasites, as well as transmission which is mediated by sexually differentiated parasites. The deubiquitinating enzyme PfUCHL3 is an essential protein, transcribed across both human and mosquito developmental stages. PfUCHL3 is considered hard to drug by conventional methods given the high level of homology of its active site to human UCHL3 as well as to other UCH domain enzymes. Here, we apply the RaPID mRNA display technology and identify constrained peptides capable of binding to PfUCHL3 with nanomolar affinities. The two lead peptides were found to selectively inhibit the deubiquitinase activity of PfUCHL3 versus HsUCHL3. NMR spectroscopy revealed that the peptides do not act by binding to the active site but instead block binding of the ubiquitin substrate. We demonstrate that this approach can be used to target essential protein-protein interactions within the Plasmodium ubiquitin pathway, enabling the application of chemically constrained peptides as a novel class of antimalarial therapeutics.
Collapse
Affiliation(s)
- Harry R. King
- Department of Pathology, University of Cambridge, CambridgeCB2 1QP, United Kingdom
- Department of Pharmacology, University of Cambridge, CambridgeCB2 1PD, United Kingdom
| | - Mark Bycroft
- Department of Pharmacology, University of Cambridge, CambridgeCB2 1PD, United Kingdom
| | - Thanh-Binh Nguyen
- School of Chemistry and Molecular Biosciences, University of Queensland, BrisbaneQLD 4067, Australia
| | - Geoff Kelly
- NMR Centre, Francis Crick Institute, LondonNW1 1AT, United Kingdom
| | - Alexander A. Vinogradov
- Department of Chemistry, Graduate School of Science, University of Tokyo, Tokyo113-0033, Japan
| | - Pamela J. E. Rowling
- Department of Pharmacology, University of Cambridge, CambridgeCB2 1PD, United Kingdom
| | - Katherine Stott
- Department of Biochemistry, University of Cambridge, CambridgeCB2 1GA, United Kingdom
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, BrisbaneQLD 4067, Australia
| | - Hiroaki Suga
- Department of Chemistry, Graduate School of Science, University of Tokyo, Tokyo113-0033, Japan
| | - Laura S. Itzhaki
- Department of Pharmacology, University of Cambridge, CambridgeCB2 1PD, United Kingdom
| | | |
Collapse
|
40
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
41
|
Caba K, Tran-Nguyen VK, Rahman T, Ballester PJ. Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors. J Cheminform 2024; 16:40. [PMID: 38582911 PMCID: PMC10999096 DOI: 10.1186/s13321-024-00832-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 03/23/2024] [Indexed: 04/08/2024] Open
Abstract
Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein-ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.
Collapse
Affiliation(s)
- Klaudia Caba
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK
| | - Viet-Khoa Tran-Nguyen
- Unité de Biologie Fonctionnelle et Adaptative (BFA), UFR Sciences du Vivant, Université Paris Cité, 75013, Paris, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
42
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
43
|
Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
Affiliation(s)
- Micholas Dean Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - L Darryl Quarles
- Departments of Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA; ORRxD LLC, 3404 Olney Drive, Durham, NC 27705, USA
| | - Omar Demerdash
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Jeremy C Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
| |
Collapse
|
44
|
Bansal N, Wang Y, Sciabola S. Machine Learning Methods as a Cost-Effective Alternative to Physics-Based Binding Free Energy Calculations. Molecules 2024; 29:830. [PMID: 38398581 PMCID: PMC10893267 DOI: 10.3390/molecules29040830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 01/24/2024] [Accepted: 02/09/2024] [Indexed: 02/25/2024] Open
Abstract
The rank ordering of ligands remains one of the most attractive challenges in drug discovery. While physics-based in silico binding affinity methods dominate the field, they still have problems, which largely revolve around forcefield accuracy and sampling. Recent advances in machine learning have gained traction for protein-ligand binding affinity predictions in early drug discovery programs. In this article, we perform retrospective binding free energy evaluations for 172 compounds from our internal collection spread over four different protein targets and five congeneric ligand series. We compared multiple state-of-the-art free energy methods ranging from physics-based methods with different levels of complexity and conformational sampling to state-of-the-art machine-learning-based methods that were available to us. Overall, we found that physics-based methods behaved particularly well when the ligand perturbations were made in the solvation region, and they did not perform as well when accounting for large conformational changes in protein active sites. On the other end, machine-learning-based methods offer a good cost-effective alternative for binding free energy calculations, but the accuracy of their predictions is highly dependent on the experimental data available for training the model.
Collapse
Affiliation(s)
- Nupur Bansal
- Biotherapeutic and Medicinal Sciences, Biogen, 225 Binney Street, Cambridge, MA 02142, USA; (Y.W.); (S.S.)
| | | | | |
Collapse
|
45
|
Xiao F, Ding X, Shi Y, Wang D, Wang Y, Cui C, Zhu T, Chen K, Xiang P, Luo X. Application of ensemble learning for predicting GABA A receptor agonists. Comput Biol Med 2024; 169:107958. [PMID: 38194778 DOI: 10.1016/j.compbiomed.2024.107958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 12/29/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]
Abstract
BACKGROUND Over the past few decades, agonists binding to the benzodiazepine site of the GABAA receptor have been successfully developed as clinical drugs. Different modulators (agonist, antagonist, and reverse agonist) bound to benzodiazepine sites exhibit different or even opposite pharmacological effects, however, their structures are so similar that it is difficult to distinguish them based solely on molecular skeleton. This study aims to develop classification models for predicting the agonists. METHODS 306 agonists or non-agonists were collected from literature. Six machine learning algorithms including RF, XGBoost, AdaBoost, GBoost, SVM, and ANN algorithms were employed for model development. Using six descriptors including 1D/2D Descriptors, ECFP4, 2D-Pharmacophore, MACCS, PubChem, and Estate fingerprint to characterize chemical structures. The model interpretability was explored by SHAP method. RESULTS The best model demonstrated an AUC value of 0.905 and an MCC value of 0.808 for the test set. The PubMac-based model (PubMac-GB) achieved best AUC values of 0.935 for test set. The SHAP analysis results emphasized that MaccsFP62, ECFP_624, ECFP_724, and PubchemFP213 were the crucial molecular features. Applicability domain analysis was also performed to determine reliable prediction boundaries for the model. The PubMac-GB model was applied to virtual screening for potential GABAA agonists and the top 100 compounds were given. CONCLUSION Overall, our ensemble learning-based model (PubMac-GB) achieved comparable performance and would be helpful in effectively identifying agonists of GABAA receptors.
Collapse
Affiliation(s)
- Fu Xiao
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yan Shi
- Academy of Forensic Science, Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, 200063, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tingfei Zhu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaixian Chen
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ping Xiang
- Academy of Forensic Science, Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, 200063, China.
| | - Xiaomin Luo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| |
Collapse
|
46
|
Zhou H, Skolnick J. FRAGSITE2: A structure and fragment-based approach for virtual ligand screening. Protein Sci 2024; 33:e4869. [PMID: 38100293 PMCID: PMC10751727 DOI: 10.1002/pro.4869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 12/06/2023] [Accepted: 12/09/2023] [Indexed: 12/17/2023]
Abstract
Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For both the DUD-E set and DEKOIS2.0 set and ligands having a Tanimoto coefficient (TC) < 0.7 to the template ligands, the 1% enrichment factor (EF1% ) of FRAGSITE2 is significantly better than those for FINDSITEcomb2.0 , an earlier LHM algorithm. For the DUD-E set, FRAGSITE2 also shows better ROC enrichment factor and AUPR (area under the precision-recall curve) than the deep learning DenseFS scoring function. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC < 0.99 to training DUD-E ligands, FRAGSITE2 has double the EF1% . Its boosted tree regression method provides for more robust performance than a deep learning multiple layer perceptron method. When compared with the pretrained language model for protein target features, FRAGSITE2 also shows much better performance. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets. FRAGSITE2's web service is freely available to academic users at http://sites.gatech.edu/cssb/FRAGSITE2.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| |
Collapse
|
47
|
Menchon G, Maveyraud L, Czaplicki G. Molecular Dynamics as a Tool for Virtual Ligand Screening. Methods Mol Biol 2024; 2714:33-83. [PMID: 37676592 DOI: 10.1007/978-1-0716-3441-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Rational drug design is essential for new drugs to emerge, especially when the structure of a target protein or nucleic acid is known. To that purpose, high-throughput virtual ligand screening campaigns aim at discovering computationally new binding molecules or fragments to modulate particular biomolecular interactions or biological activities, related to a disease process. The structure-based virtual ligand screening process primarily relies on docking methods which allow predicting the binding of a molecule to a biological target structure with a correct conformation and the best possible affinity. The docking method itself is not sufficient as it suffers from several and crucial limitations (lack of full protein flexibility information, no solvation and ion effects, poor scoring functions, and unreliable molecular affinity estimation).At the interface of computer techniques and drug discovery, molecular dynamics (MD) allows introducing protein flexibility before or after a docking protocol, refining the structure of protein-drug complexes in the presence of water, ions, and even in membrane-like environments, describing more precisely the temporal evolution of the biological complex and ranking these complexes with more accurate binding energy calculations. In this chapter, we describe the up-to-date MD, which plays the role of supporting tools in the virtual ligand screening (VS) process.Without a doubt, using docking in combination with MD is an attractive approach in structure-based drug discovery protocols nowadays. It has proved its efficiency through many examples in the literature and is a powerful method to significantly reduce the amount of required wet experimentations (Tarcsay et al, J Chem Inf Model 53:2990-2999, 2013; Barakat et al, PLoS One 7:e51329, 2012; De Vivo et al, J Med Chem 59:4035-4061, 2016; Durrant, McCammon, BMC Biol 9:71-79, 2011; Galeazzi, Curr Comput Aided Drug Des 5:225-240, 2009; Hospital et al, Adv Appl Bioinforma Chem 8:37-47, 2015; Jiang et al, Molecules 20:12769-12786, 2015; Kundu et al, J Mol Graph Model 61:160-174, 2015; Mirza et al, J Mol Graph Model 66:99-107, 2016; Moroy et al, Future Med Chem 7:2317-2331, 2015; Naresh et al, J Mol Graph Model 61:272-280, 2015; Nichols et al, J Chem Inf Model 51:1439-1446, 2011; Nichols et al, Methods Mol Biol 819:93-103, 2012; Okimoto et al, PLoS Comput Biol 5:e1000528, 2009; Rodriguez-Bussey et al, Biopolymers 105:35-42, 2016; Sliwoski et al, Pharmacol Rev 66:334-395, 2014).
Collapse
Affiliation(s)
- Grégory Menchon
- Inserm U1242, Oncogenesis, Stress and Signaling (OSS), Université de Rennes 1, Rennes, France
| | - Laurent Maveyraud
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France
| | - Georges Czaplicki
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France.
| |
Collapse
|
48
|
Agarwal R, T RR, Smith JC. Comparative Assessment of Pose Prediction Accuracy in RNA-Ligand Docking. J Chem Inf Model 2023; 63:7444-7452. [PMID: 37972310 DOI: 10.1021/acs.jcim.3c01533] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Structure-based virtual high-throughput screening is used in early-stage drug discovery. Over the years, docking protocols and scoring functions for protein-ligand complexes have evolved to improve the accuracy in the computation of binding strengths and poses. In the past decade, RNA has also emerged as a target class for new small-molecule drugs. However, most ligand docking programs have been validated and tested for proteins and not RNA. Here, we test the docking power (pose prediction accuracy) of three state-of-the-art docking protocols on 173 RNA-small molecule crystal structures. The programs are AutoDock4 (AD4) and AutoDock Vina (Vina), which were designed for protein targets, and rDock, which was designed for both protein and nucleic acid targets. AD4 performed relatively poorly. For RNA targets for which a crystal structure of a bound ligand used to limit the docking search space is available and for which the goal is to identify new molecules for the same pocket, rDock performs slightly better than Vina, with success rates of 48% and 63%, respectively. However, in the more common type of early-stage drug discovery setting, in which no structure of a ligand-target complex is known and for which a larger search space is defined, rDock performed similarly to Vina, with a low success rate of ∼27%. Vina was found to have bias for ligands with certain physicochemical properties, whereas rDock performs similarly for all ligand properties. Thus, for projects where no ligand-protein structure already exists, Vina and rDock are both applicable. However, the relatively poor performance of all methods relative to protein-target docking illustrates a need for further methods refinement.
Collapse
Affiliation(s)
- Rupesh Agarwal
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| | - Rajitha Rajeshwar T
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| | - Jeremy C Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| |
Collapse
|
49
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
50
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|