1
|
Jia X, Teutonico D, Dhakal S, Psarellis YM, Abos A, Zhu H, Mavroudis PD, Pillai N. Application of Machine Learning and Mechanistic Modeling to Predict Intravenous Pharmacokinetic Profiles in Humans. J Med Chem 2025; 68:7737-7750. [PMID: 40146185 PMCID: PMC11998014 DOI: 10.1021/acs.jmedchem.5c00340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 03/14/2025] [Accepted: 03/20/2025] [Indexed: 03/28/2025]
Abstract
Accurate prediction of new compounds' pharmacokinetic (PK) profile in humans is crucial for drug discovery. Traditional methods, including allometric scaling and mechanistic modeling, rely on parameters from in vitro or in vivo testing, which are labor-intensive and involve ethical concerns. This study leverages machine learning (ML) to overcome these limitations by developing data-driven models. We compiled a large data set of small molecules' physicochemical and PK properties from public sources and digitized human plasma concentration-time profiles for approximately 800 compounds from the literature. We introduced a hybrid modeling framework that combines ML with physiologically based pharmacokinetic modeling and a hierarchical ML framework that employs two steps of learning to directly estimate PK profiles. Tested on 106 drugs, these frameworks demonstrated prediction accuracies within a 2-fold and 5-fold error for 40-60% and 80%-90% of compounds, respectively, in both AUC and Cmax. Proposed approaches could enhance early molecular screening and design, advancing drug discovery capabilities.
Collapse
Affiliation(s)
- Xuelian Jia
- Center
for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United States
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Donato Teutonico
- Quantitative
Pharmacology - Pharmacometrics, Sanofi, Vitry-sur-Seine 94400, France
| | - Saroj Dhakal
- Quantitative
Pharmacology - Pharmacometrics, Sanofi, Cambridge, Massachusetts 02141, United States
| | - Yorgos M. Psarellis
- Quantitative
Pharmacology - Pharmacometrics, Sanofi, Cambridge, Massachusetts 02141, United States
| | - Alexandra Abos
- Commercial
Data and Analytics, Sanofi, Barcelona 08016, Spain
| | - Hao Zhu
- Center
for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United States
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Panteleimon D. Mavroudis
- Quantitative
Pharmacology - Pharmacometrics, Sanofi, Cambridge, Massachusetts 02141, United States
| | - Nikhil Pillai
- Quantitative
Pharmacology - Pharmacometrics, Sanofi, Cambridge, Massachusetts 02141, United States
| |
Collapse
|
2
|
Yang C, Wang X, Zhao X, Wu Y, Lin J, Zhao Y, Xu Y, Sun K, Zhang C, Wan Z, Zhao W, Xiao Y, Sun H, Chen D, Dong W, Wang T, Wang W. Effect of Fluorine Atoms and Piperazine Rings on Biotoxicity of Norfloxacin Analogues: Combined Experimental and Theoretical Study. ENVIRONMENT & HEALTH (WASHINGTON, D.C.) 2024; 2:886-901. [PMID: 39722844 PMCID: PMC11667292 DOI: 10.1021/envhealth.4c00095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 12/28/2024]
Abstract
To clarify the effect of the fluorine atom and piperazine ring on norfloxacin (NOR), NOR degradation products (NOR-DPs, P1-P8) were generated via UV combined with hydrogen peroxide (UV/H2O2) technology. NOR degradation did not significantly affect cytotoxicity of NOR against BV2, A549, HepG2, and Vero E6 cells. Compared with that of NOR, mutagenicity and median lethal concentration of P1-P8 in fathead minnow were increased, and bioaccumulation factor and oral median lethal dose of P1-P8 in rats were decreased. Molecular docking was used to evaluate the inhibitory effect of DNA gyrase A (gyrA) on NOR-DPs to determine the molecular-level mechanism and establish the structure-activity relationship. Results indicated that the most common amino acid residues were Ile13, Ser27, Val28, Gly31, Asp36, Arg46, Arg47, Asp157, and Gly340; hydrogen bonds and hydrophobic interactions played key roles in the inhibitory effect. Binding area (BA) decreased from 350.80 Å2 (NOR) to 346.21 Å2 (P1), and the absolute value of binding energy (|BE|) changed from 2.53 kcal/mol (NOR) to 2.54 kcal/mol (P1), indicating that the fluorine atom mainly affects BA. The piperazine ring clearly influenced BA and |BE|. "Yang ChuanXi Rules" were used to explain effects of molecular weight (MW), BA, |BE|, and sum of η1 + η2 (η1: normalization of BA, η2: normalization of |BE|) and predict biotoxicity of NOR-DPs based on half-maximum inhibitory concentration (IC50), half-minimal inhibitory concentration (MIC50), and half-minimal bactericidal concentration (MBC50) values.
Collapse
Affiliation(s)
- Chuanxi Yang
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Xiaoning Wang
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Xinyan Zhao
- Business
School, Qingdao University of Technology, Qingdao 266520, China
| | - Yongkun Wu
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Jingyan Lin
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Yuhan Zhao
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Yiyong Xu
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Kaipeng Sun
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Chao Zhang
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Ziheng Wan
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Weihua Zhao
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Yihua Xiao
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Haofen Sun
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Dong Chen
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Wenping Dong
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Tieyu Wang
- Guangdong
Provincial Key Laboratory of Marine Disaster Prediction and Prevention, Shantou University, Shantou 515063, China
| | - Weiliang Wang
- School
of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China
| |
Collapse
|
3
|
Patne AY, Dhulipala SM, Lawless W, Prakash S, Mohapatra SS, Mohapatra S. Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches. Int J Mol Sci 2024; 25:12233. [PMID: 39596300 PMCID: PMC11594879 DOI: 10.3390/ijms252212233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/01/2024] [Accepted: 11/06/2024] [Indexed: 11/28/2024] Open
Abstract
The complexities inherent in drug development are multi-faceted and often hamper accuracy, speed and efficiency, thereby limiting success. This review explores how recent developments in machine learning (ML) are significantly impacting target-based drug discovery, particularly in small-molecule approaches. The Simplified Molecular Input Line Entry System (SMILES), which translates a chemical compound's three-dimensional structure into a string of symbols, is now widely used in drug design, mining, and repurposing. Utilizing ML and natural language processing techniques, SMILES has revolutionized lead identification, high-throughput screening and virtual screening. ML models enhance the accuracy of predicting binding affinity and selectivity, reducing the need for extensive experimental screening. Additionally, deep learning, with its strengths in analyzing spatial and sequential data through convolutional neural networks (CNNs) and recurrent neural networks (RNNs), shows promise for virtual screening, target identification, and de novo drug design. Fragment-based approaches also benefit from ML algorithms and techniques like generative adversarial networks (GANs), which predict fragment properties and binding affinities, aiding in hit selection and design optimization. Structure-based drug design, which relies on high-resolution protein structures, leverages ML models for accurate predictions of binding interactions. While challenges such as interpretability and data quality remain, ML's transformative impact accelerates target-based drug discovery, increasing efficiency and innovation. Its potential to deliver new and improved treatments for various diseases is significant.
Collapse
Affiliation(s)
- Akshata Yashwant Patne
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- Taneja College of Pharmacy Graduate Programs, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
| | - Sai Madhav Dhulipala
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (S.M.D.); (W.L.)
| | - William Lawless
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (S.M.D.); (W.L.)
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| | - Satya Prakash
- Biomedical Technology and Cell Therapy Research Laboratory, Department of Biomedical Engineering, Faculty of Medicine and Health Sciences, McGill University, 3775 University Street, Montreal, QC H3A 2B4, Canada;
| | - Shyam S. Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- Taneja College of Pharmacy Graduate Programs, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| | - Subhra Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- Taneja College of Pharmacy Graduate Programs, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (S.M.D.); (W.L.)
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| |
Collapse
|
4
|
Abduljalil JM, Elfiky AA. Machine-Learning Approach to Identify Potential Dengue Virus Protease Inhibitors: A Computational Perspective. J Phys Chem B 2024; 128:11229-11242. [PMID: 39484814 DOI: 10.1021/acs.jpcb.4c05388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The global prevalence of dengue virus (DENV), a widespread flavivirus, has led to varied epidemiological impacts, economic burdens, and health consequences. The alarming increase in infections is exacerbated by the absence of approved antiviral agents against the DENV. Within flaviviruses, the NS3/NS2B serine protease plays a pivotal role in processing the viral polyprotein into distinct components, making it an attractive target for antiviral drug development. In this study, machine-learning (ML) techniques were employed to build predictive models for the screening of a library containing 32,000 protease inhibitors. Utilizing GNINA for structure-based virtual screening, the top potential candidates underwent a subsequent evaluation of their absorption, distribution, metabolism, excretion, and toxicity properties. Selected compounds were subjected to molecular dynamics simulations and binding free energy calculations via MM/GBSA. The results suggest that comp530 possesses binding potential to DENV protease as a noncovalent inhibitor with multiple positions for chemical substitutions, presenting opportunities for optimizing their selectivity and specificity. However, other compounds predicted via ML models may still provide a promising start for covalent inhibitors.
Collapse
Affiliation(s)
- Jameel M Abduljalil
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales 2006, Australia
| | - Abdo A Elfiky
- Department of Biophysics, Faculty of Science, Cairo University, Giza 12613, Egypt
| |
Collapse
|
5
|
van den Maagdenberg HW, Šícho M, Araripe DA, Luukkonen S, Schoenmaker L, Jespers M, Béquignon OJM, González MG, van den Broek RL, Bernatavicius A, van Hasselt JGC, van der Graaf PH, van Westen GJP. QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool. J Cheminform 2024; 16:128. [PMID: 39543652 PMCID: PMC11566221 DOI: 10.1186/s13321-024-00908-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 09/17/2024] [Indexed: 11/17/2024] Open
Abstract
Building reliable and robust quantitative structure-property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of their models and facilitate their transferability into practice. In this work, we introduce QSPRpred, a toolkit for analysis of bioactivity data sets and QSPR modelling, which attempts to address the aforementioned challenges. QSPRpred's modular Python API enables users to intuitively describe different parts of a modelling workflow using a plethora of pre-implemented components, but also integrates customized implementations in a "plug-and-play" manner. QSPRpred data sets and models are directly serializable, which means they can be readily reproduced and put into operation after training as the models are saved with all required data pre-processing steps to make predictions on new compounds directly from SMILES strings. The general-purpose character of QSPRpred is also demonstrated by inclusion of support for multi-task and proteochemometric modelling. The package is extensively documented and comes with a large collection of tutorials to help new users. In this paper, we describe all of QSPRpred's functionalities and also conduct a small benchmarking case study to illustrate how different components can be leveraged to compare a diverse set of models. QSPRpred is fully open-source and available at https://github.com/CDDLeiden/QSPRpred .Scientific ContributionQSPRpred aims to provide a complex, but comprehensive Python API to conduct all tasks encountered in QSPR modelling from data preparation and analysis to model creation and model deployment. In contrast to similar packages, QSPRpred offers a wider and more exhaustive range of capabilities and integrations with many popular packages that also go beyond QSPR modelling. A significant contribution of QSPRpred is also in its automated and highly standardized serialization scheme, which significantly improves reproducibility and transferability of models.
Collapse
Affiliation(s)
- Helle W van den Maagdenberg
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
| | - Martin Šícho
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, Prague, A-4040, Czech Republic
| | - David Alencar Araripe
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, Einthovenweg 20, Leiden, 2333ZC, The Netherlands
| | - Sohvi Luukkonen
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Altenberger Straße 69, Linz, 610101, Austria
| | - Linde Schoenmaker
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
| | - Michiel Jespers
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
| | - Olivier J M Béquignon
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- Department of Neurosurgery, Brain Tumor Center Amsterdam, Amsterdam University Medical Center, Cancer Center Amsterdam, De Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands
| | - Marina Gorostiola González
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Remco L van den Broek
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
| | - Andrius Bernatavicius
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, The Netherlands
| | - J G Coen van Hasselt
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
| | - Piet H van der Graaf
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands
- Certara UK, University Road, Canterbury Innovation Centre, Unit 43, Canterbury, Kent, CT2 7FG, UK
| | - Gerard J P van Westen
- Computational Drug Discovery, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, The Netherlands.
| |
Collapse
|
6
|
Geci R, Gadaleta D, de Lomana MG, Ortega-Vallbona R, Colombo E, Serrano-Candelas E, Paini A, Kuepfer L, Schaller S. Systematic evaluation of high-throughput PBK modelling strategies for the prediction of intravenous and oral pharmacokinetics in humans. Arch Toxicol 2024; 98:2659-2676. [PMID: 38722347 PMCID: PMC11272695 DOI: 10.1007/s00204-024-03764-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 04/23/2024] [Indexed: 07/26/2024]
Abstract
Physiologically based kinetic (PBK) modelling offers a mechanistic basis for predicting the pharmaco-/toxicokinetics of compounds and thereby provides critical information for integrating toxicity and exposure data to replace animal testing with in vitro or in silico methods. However, traditional PBK modelling depends on animal and human data, which limits its usefulness for non-animal methods. To address this limitation, high-throughput PBK modelling aims to rely exclusively on in vitro and in silico data for model generation. Here, we evaluate a variety of in silico tools and different strategies to parameterise PBK models with input values from various sources in a high-throughput manner. We gather 2000 + publicly available human in vivo concentration-time profiles of 200 + compounds (IV and oral administration), as well as in silico, in vitro and in vivo determined compound-specific parameters required for the PBK modelling of these compounds. Then, we systematically evaluate all possible PBK model parametrisation strategies in PK-Sim and quantify their prediction accuracy against the collected in vivo concentration-time profiles. Our results show that even simple, generic high-throughput PBK modelling can provide accurate predictions of the pharmacokinetics of most compounds (87% of Cmax and 84% of AUC within tenfold). Nevertheless, we also observe major differences in prediction accuracies between the different parameterisation strategies, as well as between different compounds. Finally, we outline a strategy for high-throughput PBK modelling that relies exclusively on freely available tools. Our findings contribute to a more robust understanding of the reliability of high-throughput PBK modelling, which is essential to establish the confidence necessary for its utilisation in Next-Generation Risk Assessment.
Collapse
Affiliation(s)
- René Geci
- esqLABS GmbH, Saterland, Germany.
- Institute for Systems Medicine with Focus on Organ Interaction, University Hospital RWTH Aachen, Aachen, Germany.
| | | | - Marina García de Lomana
- Machine Learning Research, Research and Development, Pharmaceuticals, Bayer AG, Berlin, Germany
| | | | - Erika Colombo
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | | | - Lars Kuepfer
- Institute for Systems Medicine with Focus on Organ Interaction, University Hospital RWTH Aachen, Aachen, Germany
| | | |
Collapse
|
7
|
Ekins S, Lane TR, Urbina F, Puhl AC. In silico ADME/tox comes of age: twenty years later. Xenobiotica 2024; 54:352-358. [PMID: 37539466 PMCID: PMC10850432 DOI: 10.1080/00498254.2023.2245049] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/05/2023]
Abstract
In the early 2000s pharmaceutical drug discovery was beginning to use computational approaches for absorption, distribution, metabolism, excretion and toxicity (ADME/Tox, also known as ADMET) prediction. This emphasis on prediction was an effort to reduce the risk of later stage failures from ADME/Tox.Much has been written in the intervening twenty plus years and significant expenditure has occurred in companies developing these in silico capabilities which can be gleaned from publications. It is therefore an appropriate time to briefly reflect on what was proposed then and what the reality is today.20 years ago, we tended to optimise bioactivity and perhaps one ADME/Tox property at a time. Previously pharmaceutical companies needed a whole infrastructure for models - in silico and in vitro experts, IT, champions on a project team, educators and management support. Now we are in the age of generative de novo design where bioactivity and many ADME/Tox properties can be optimised and large language model technologies are available.There are also some challenges such as the focus on very large molecules which may be outside of current ADME/Tox models.We provide an opportunity to look forward with the increasing public data for ADME/Tox as well as expanded types of algorithms available.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Ana C. Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
8
|
Retchin M, Wang Y, Takaba K, Chodera JD. DrugGym: A testbed for the economics of autonomous drug discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596296. [PMID: 38854082 PMCID: PMC11160604 DOI: 10.1101/2024.05.28.596296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Drug discovery is stochastic. The effectiveness of candidate compounds in satisfying design objectives is unknown ahead of time, and the tools used for prioritization-predictive models and assays-are inaccurate and noisy. In a typical discovery campaign, thousands of compounds may be synthesized and tested before design objectives are achieved, with many others ideated but deprioritized. These challenges are well-documented, but assessing potential remedies has been difficult. We introduce DrugGym, a framework for modeling the stochastic process of drug discovery. Emulating biochemical assays with realistic surrogate models, we simulate the progression from weak hits to sub-micromolar leads with viable ADME. We use this testbed to examine how different ideation, scoring, and decision-making strategies impact statistical measures of utility, such as the probability of program success within predefined budgets and the expected costs to achieve target candidate profile (TCP) goals. We also assess the influence of affinity model inaccuracy, chemical creativity, batch size, and multi-step reasoning. Our findings suggest that reducing affinity model inaccuracy from 2 to 0.5 pIC50 units improves budget-constrained success rates tenfold. DrugGym represents a realistic testbed for machine learning methods applied to the hit-to-lead phase. Source code is available at www.drug-gym.org.
Collapse
Affiliation(s)
- Michael Retchin
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
| | - Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Simons Center for Computational Chemistry and Center for Data Science, New York University, New York, NY 10004
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - John D. Chodera
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| |
Collapse
|
9
|
Pillai N, Abos A, Teutonico D, Mavroudis PD. Machine learning framework to predict pharmacokinetic profile of small molecule drugs based on chemical structure. Clin Transl Sci 2024; 17:e13824. [PMID: 38752574 PMCID: PMC11097621 DOI: 10.1111/cts.13824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 04/09/2024] [Accepted: 04/30/2024] [Indexed: 05/19/2024] Open
Abstract
Accurate prediction of a new compound's pharmacokinetic (PK) profile is pivotal for the success of drug discovery programs. An initial assessment of PK in preclinical species and humans is typically performed through allometric scaling and mathematical modeling. These methods use parameters estimated from in vitro or in vivo experiments, which although helpful for an initial estimation, require extensive animal experiments. Furthermore, mathematical models are limited by the mechanistic underpinning of the drugs' absorption, distribution, metabolism, and elimination (ADME) which are largely unknown in the early stages of drug discovery. In this work, we propose a novel methodology in which concentration versus time profile of small molecules in rats is directly predicted by machine learning (ML) using structure-driven molecular properties as input and thus mitigating the need for animal experimentation. The proposed framework initially predicts ADME properties based on molecular structure and then uses them as input to a ML model to predict the PK profile. For the compounds tested, our results demonstrate that PK profiles can be adequately predicted using the proposed algorithm, especially for compounds with Tanimoto score greater than 0.5, the average mean absolute percentage error between predicted PK profile and observed PK profile data was found to be less than 150%. The suggested framework aims to facilitate PK predictions and thus support molecular screening and design earlier in the drug discovery process.
Collapse
Affiliation(s)
- Nikhil Pillai
- Global DMPK Modeling & Simulation, SanofiCambridgeMassachusettsUSA
| | | | - Donato Teutonico
- Translational Medicine & Early Development, SanofiVitry‐sur‐SeineFrance
| | | |
Collapse
|
10
|
Li G, Sun Y, Zhu L. Application of machine learning combined with population pharmacokinetics to improve individual prediction of vancomycin clearance in simulated adult patients. Front Pharmacol 2024; 15:1352113. [PMID: 38562463 PMCID: PMC10982467 DOI: 10.3389/fphar.2024.1352113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Abstract
Background and aim Vancomycin, a glycopeptide antimicrobial drug. PPK has problems such as difficulty in accurately reflecting inter-individual differences, and the PPK model may not be accurate enough to predict individual pharmacokinetic parameters. Therefore, the aim of this study is to investigate whether the application of machine learning combined with the PPK method can improve the prediction of vancomycin CL in adult Chinese patients. Methods In the first step, a vancomycin CL prediction model for Chinese adult patients is given by PPK and Hamilton Monte Carlo sampling is used to obtain the reference CL of 1,000 patients; the second step is to obtain the final prediction model by machine learning using an appropriate model for the predictive factor and the reference CL; and the third step is to randomly select, in the simulated data, a total of 250 patients for prediction effect evaluation. Results XGBoost model is selected as final machine learning model. More than four-fifths of the subjects' predictive values regarding vancomycin CL are improved by machine learning combined with PPK. Machine learning combined with PPK models is more stable in performance than the PPK method alone for predicting models. Conclusion The first combination of PPK and machine learning for predictive modeling of vancomycin clearance in adult patients. It provides a reference for clinical pharmacists or clinicians to optimize the initial dosage given to ensure the effectiveness and safety of drug therapy for each patient.
Collapse
Affiliation(s)
- Guodong Li
- Department of Mathematics, Guilin University of Electronic Technology, Guilin, China
| | - Yubo Sun
- Department of Mathematics, Guilin University of Electronic Technology, Guilin, China
| | - Liping Zhu
- Department of Mathematics, Changji University, Xinjiang, China
| |
Collapse
|
11
|
Geng C, Wang Z, Tang Y. Machine learning in Alzheimer's disease drug discovery and target identification. Ageing Res Rev 2024; 93:102172. [PMID: 38104638 DOI: 10.1016/j.arr.2023.102172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/28/2023] [Accepted: 12/13/2023] [Indexed: 12/19/2023]
Abstract
Alzheimer's disease (AD) stands as a formidable neurodegenerative ailment that poses a substantial threat to the elderly population, with no known curative or disease-slowing drugs in existence. Among the vital and time-consuming stages in the drug discovery process, disease modeling and target identification hold particular significance. Disease modeling allows for a deeper comprehension of disease progression mechanisms and potential therapeutic avenues. On the other hand, target identification serves as the foundational step in drug development, exerting a profound influence on all subsequent phases and ultimately determining the success rate of drug development endeavors. Machine learning (ML) techniques have ushered in transformative breakthroughs in the realm of target discovery. Leveraging the strengths of large dataset analysis, multifaceted data processing, and the exploration of intricate biological mechanisms, ML has become instrumental in the quest for effective AD treatments. In this comprehensive review, we offer an account of how ML methodologies are being deployed in the pursuit of drug discovery for AD. Furthermore, we provide an overview of the utilization of ML in uncovering potential intervention strategies and prospective therapeutic targets for AD. Finally, we discuss the principal challenges and limitations currently faced by these approaches. We also explore the avenues for future research that hold promise in addressing these challenges.
Collapse
Affiliation(s)
- Chaofan Geng
- Department of Neurology & Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - ZhiBin Wang
- Department of Neurology & Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Yi Tang
- Department of Neurology & Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China; Neurodegenerative Laboratory of Ministry of Education of the People's Republic of China, Beijing, China.
| |
Collapse
|
12
|
Keefer CE, Chang G, Di L, Woody NA, Tess DA, Osgood SM, Kapinos B, Racich J, Carlo AA, Balesano A, Ferguson N, Orozco C, Zueva L, Luo L. The Comparison of Machine Learning and Mechanistic In Vitro-In Vivo Extrapolation Models for the Prediction of Human Intrinsic Clearance. Mol Pharm 2023; 20:5616-5630. [PMID: 37812508 DOI: 10.1021/acs.molpharmaceut.3c00502] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
Accurate prediction of human pharmacokinetics (PK) remains one of the key objectives of drug metabolism and PK (DMPK) scientists in drug discovery projects. This is typically performed by using in vitro-in vivo extrapolation (IVIVE) based on mechanistic PK models. In recent years, machine learning (ML), with its ability to harness patterns from previous outcomes to predict future events, has gained increased popularity in application to absorption, distribution, metabolism, and excretion (ADME) sciences. This study compares the performance of various ML and mechanistic models for the prediction of human IV clearance for a large (645) set of diverse compounds with literature human IV PK data, as well as measured relevant in vitro end points. ML models were built using multiple approaches for the descriptors: (1) calculated physical properties and structural descriptors based on chemical structure alone (classical QSAR/QSPR); (2) in vitro measured inputs only with no structure-based descriptors (ML IVIVE); and (3) in silico ML IVIVE using in silico model predictions for the in vitro inputs. For the mechanistic models, well-stirred and parallel-tube liver models were considered with and without the use of empirical scaling factors and with and without renal clearance. The best ML model for the prediction of in vivo human intrinsic clearance (CLint) was an in vitro ML IVIVE model using only six in vitro inputs with an average absolute fold error (AAFE) of 2.5. The best mechanistic model used the parallel-tube liver model, with empirical scaling factors resulting in an AAFE of 2.8. The corresponding mechanistic model with full in silico inputs achieved an AAFE of 3.3. These relative performances of the models were confirmed with the prediction of 16 Pfizer drug candidates that were not part of the original data set. Results show that ML IVIVE models are comparable to or superior to their best mechanistic counterparts. We also show that ML IVIVE models can be used to derive insights into factors for the improvement of mechanistic PK prediction.
Collapse
Affiliation(s)
- Christopher E Keefer
- Translational Modeling and Simulation, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - George Chang
- Translational Modeling and Simulation, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Li Di
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Nathaniel A Woody
- Translational Modeling and Simulation, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - David A Tess
- Translational Modeling and Simulation, Pfizer Worldwide Research and Development, Cambridge, Massachusetts 02139, United States
| | - Sarah M Osgood
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Brendon Kapinos
- Discovery Sciences, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Jill Racich
- Discovery Sciences, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Anthony A Carlo
- Discovery Sciences, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Amanda Balesano
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Nicholas Ferguson
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Christine Orozco
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Larisa Zueva
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| | - Lina Luo
- Pharmacokinetics, Dynamics and Metabolism, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States
| |
Collapse
|
13
|
Allam T, Balderston DE, Chahal MK, Hilton KLF, Hind CK, Keers OB, Lilley RJ, Manwani C, Overton A, Popoola PIA, Thompson LR, White LJ, Hiscock JR. Tools to enable the study and translation of supramolecular amphiphiles. Chem Soc Rev 2023; 52:6892-6917. [PMID: 37753825 DOI: 10.1039/d3cs00480e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
This tutorial review focuses on providing a summary of the key techniques used for the characterisation of supramolecular amphiphiles and their self-assembled aggregates; from the understanding of low-level molecular interactions, to materials analysis, use of data to support computer-aided molecular design and finally, the translation of this class of compounds for real world application, specifically within the clinical setting. We highlight the common methodologies used for the study of traditional amphiphiles and build to provide specific examples that enable the study of specialist supramolecular systems. This includes the use of nuclear magnetic resonance spectroscopy, mass spectrometry, X-ray scattering techniques (small- and wide-angle X-ray scattering and single crystal X-ray diffraction), critical aggregation (or micelle) concentration determination methodologies, machine learning, and various microscopy techniques. Furthermore, this review provides guidance for working with supramolecular amphiphiles in in vitro and in vivo settings, as well as the use of accessible software programs, to facilitate screening and selection of druggable molecules. Each section provides: a methodology overview - information that may be derived from the use of the methodology described; a case study - examples for the application of these methodologies; and a summary section - providing methodology specific benefits, limitations and future applications.
Collapse
Affiliation(s)
- Thomas Allam
- School of Chemistry, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Dominick E Balderston
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Mandeep K Chahal
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Kira L F Hilton
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Charlotte K Hind
- Research and Evaluation, UKHSA, Porton Down, Salisbury SP4 0JG, UK
| | - Olivia B Keers
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Rebecca J Lilley
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Chandni Manwani
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Alix Overton
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Precious I A Popoola
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Lisa R Thompson
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Lisa J White
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| | - Jennifer R Hiscock
- School of Chemistry and Forensic Science, University of Kent, Canterbury, CT2 7NH, UK.
| |
Collapse
|
14
|
Wu K, Karapetyan E, Schloss J, Vadgama J, Wu Y. Advancements in small molecule drug design: A structural perspective. Drug Discov Today 2023; 28:103730. [PMID: 37536390 PMCID: PMC10543554 DOI: 10.1016/j.drudis.2023.103730] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 07/19/2023] [Accepted: 07/27/2023] [Indexed: 08/05/2023]
Abstract
In this review, we outline recent advancements in small molecule drug design from a structural perspective. We compare protein structure prediction methods and explore the role of the ligand binding pocket in structure-based drug design. We examine various structural features used to optimize drug candidates, including functional groups, stereochemistry, and molecular weight. Computational tools such as molecular docking and virtual screening are discussed for predicting and optimizing drug candidate structures. We present examples of drug candidates designed based on their molecular structure and discuss future directions in the field. By effectively integrating structural information with other valuable data sources, we can improve the drug discovery process, leading to the identification of novel therapeutics with improved efficacy, specificity, and safety profiles.
Collapse
Affiliation(s)
- Ke Wu
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
| | - Eduard Karapetyan
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
| | - John Schloss
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA; School of Pharmacy, American University of Health Sciences, Signal Hill, CA 90755, USA
| | - Jaydutt Vadgama
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA; School of Pharmacy, American University of Health Sciences, Signal Hill, CA 90755, USA.
| | - Yong Wu
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA.
| |
Collapse
|
15
|
Wang L, Zhou Y, Chen Q. AMMVF-DTI: A Novel Model Predicting Drug-Target Interactions Based on Attention Mechanism and Multi-View Fusion. Int J Mol Sci 2023; 24:14142. [PMID: 37762445 PMCID: PMC10531525 DOI: 10.3390/ijms241814142] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/09/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Accurate identification of potential drug-target interactions (DTIs) is a crucial task in drug development and repositioning. Despite the remarkable progress achieved in recent years, improving the performance of DTI prediction still presents significant challenges. In this study, we propose a novel end-to-end deep learning model called AMMVF-DTI (attention mechanism and multi-view fusion), which leverages a multi-head self-attention mechanism to explore varying degrees of interaction between drugs and target proteins. More importantly, AMMVF-DTI extracts interactive features between drugs and proteins from both node-level and graph-level embeddings, enabling a more effective modeling of DTIs. This advantage is generally lacking in existing DTI prediction models. Consequently, when compared to many of the start-of-the-art methods, AMMVF-DTI demonstrated excellent performance on the human, C. elegans, and DrugBank baseline datasets, which can be attributed to its ability to incorporate interactive information and mine features from both local and global structures. The results from additional ablation experiments also confirmed the importance of each module in our AMMVF-DTI model. Finally, a case study is presented utilizing our model for COVID-19-related DTI prediction. We believe the AMMVF-DTI model can not only achieve reasonable accuracy in DTI prediction, but also provide insights into the understanding of potential interactions between drugs and targets.
Collapse
|
16
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
17
|
Jiang J, Zhang R, Yuan Y, Li T, Li G, Zhao Z, Yu Z. NoiseMol: A noise-robusted data augmentation via perturbing noise for molecular property prediction. J Mol Graph Model 2023; 121:108454. [PMID: 36963306 DOI: 10.1016/j.jmgm.2023.108454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/05/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
Simplified Molecular-Input Line-Entry System (SMILES) is one of a widely used molecular representation methods for molecular property prediction. We conjecture that all the characters in the SMILES string of a molecule are essential for making up the molecules, but most of them make little contribution to determining a particular property of the molecule. Therefore, we verified the conjecture in the pre-experiment. Motivated by the result, we propose to inject proper noisy information into the SMILES to augment the training data by increasing the diversity of the labeled molecules. To this end, we explore injecting perturbing noise into the original labeled SMILES strings to construct augmented data for alleviating the limitation of the labeled compound data and enhancing the model to extract more useful molecular representation for molecular property prediction. Specifically, we directly adopt mask, swap, deletion, and fusion operations on SMILES strings to randomly mask, swap, and delete atoms in SMILES strings. Then, the augmented data is used by two strategies: each epoch alternately feeds the original and perturbing noisy molecules, or each batch alternately feeds the original and perturbing noisy molecules. We conduct experiments on both Transformer and BiGRU models to validate the effectiveness by adopting widely used datasets from MoleculeNet and ZINC. Experimental results demonstrate that the proposed method outperforms strong baselines on all the datasets. NoiseMol obtains the best performance on BBBP and FDA when compared with state-of-the-art methods. Besides, NoiseMol achieves the best accuracy on LogP. Therefore, injecting perturbing noise into the labeled SMILES strings is an effective and efficient method, which improves the prediction performance, generalization, and robustness of the deep learning models.
Collapse
Affiliation(s)
- Jing Jiang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China; Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Yongna Yuan
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China; Computer College, Qinghai Normal University, Xining, Qinghai, China.
| | - Gaili Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Zhixuan Yu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| |
Collapse
|
18
|
Wu S, Pan Z, Li X, Wang Y, Tang J, Li H, Lu G, Li J, Feng Z, He Y, Liu X. Machine Learning Assisted Photothermal Conversion Efficiency Prediction of Anticancer Photothermal Agents. Chem Eng Sci 2023. [DOI: 10.1016/j.ces.2023.118619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
19
|
Sharma A, Lim J, Lah MS. Strategies for designing metal–organic frameworks with superprotonic conductivity. Coord Chem Rev 2023. [DOI: 10.1016/j.ccr.2022.214995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
20
|
Abstract
Chemometrics and machine learning are artificial intelligence-based methods stirring a transformative change in chemistry. Organic synthesis, drug discovery and analytical techniques are incorporating machine learning techniques at an accelerated pace. However, machine-assisted chemistry faces challenges while solving critical problems in chemistry due to complex relationships in data sets. Even with increasing publishing volumes on machine learning, its application in areas of chemistry is not a straightforward endeavour. A particular concern in applying machine learning in chemistry is data availability and reproducibility. The present review article discusses the various chemometric methods, expert systems, and machine learning techniques developed for solving problems of organic synthesis and drug discovery with selected examples. Further, a concise discussion on chemometrics and ML deployed in analytical techniques such as, spectroscopy, microscopy and chromatography are presented. Finally, the review reflects the challenges, opportunities and future perspectives on machine learning and automation in chemistry. The review concludes by pondering on some tough questions on applying machine learning and their possibility of navigation in the different terrains of chemistry.
Collapse
Affiliation(s)
- Payal B. Joshi
- Operations and Method Development, Shefali Research Laboratories, Ambernath (East), Thane, Maharashtra 421501 India
| |
Collapse
|
21
|
Data-Driven Prediction of the Formation of Co-Amorphous Systems. Pharmaceutics 2023; 15:pharmaceutics15020347. [PMID: 36839668 PMCID: PMC9968185 DOI: 10.3390/pharmaceutics15020347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/16/2023] [Accepted: 01/17/2023] [Indexed: 01/22/2023] Open
Abstract
Co-amorphous systems (COAMS) have raised increasing interest in the pharmaceutical industry, since they combine the increased solubility and/or faster dissolution of amorphous forms with the stability of crystalline forms. However, the choice of the co-former is critical for the formation of a COAMS. While some models exist to predict the potential formation of COAMS, they often focus on a limited group of compounds. Here, four classes of combinations of an active pharmaceutical ingredient (API) with (1) another API, (2) an amino acid, (3) an organic acid, or (4) another substance were considered. A model using gradient boosting methods was developed to predict the successful formation of COAMS for all four classes. The model was tested on data not seen during training and predicted 15 out of 19 examples correctly. In addition, the model was used to screen for new COAMS in binary systems of two APIs for inhalation therapy, as diseases such as tuberculosis, asthma, and COPD usually require complex multidrug-therapy. Three of these new API-API combinations were selected for experimental testing and co-processed via milling. The experiments confirmed the predictions of the model in all three cases. This data-driven model will facilitate and expedite the screening phase for new binary COAMS.
Collapse
|
22
|
Watanabe R, Kawata T, Ueda S, Shinbo T, Higashimori M, Natsume-Kitatani Y, Mizuguchi K. Prediction of the Contribution Ratio of a Target Metabolic Enzyme to Clearance from Chemical Structure Information. Mol Pharm 2023; 20:419-426. [PMID: 36538346 PMCID: PMC9812024 DOI: 10.1021/acs.molpharmaceut.2c00698] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/22/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022]
Abstract
The contribution ratio of metabolic enzymes such as cytochrome P450 to in vivo clearance (fraction metabolized: fm) is a pharmacokinetic index that is particularly important for the quantitative evaluation of drug-drug interactions. Since obtaining experimental in vivo fm values is challenging, those derived from in vitro experiments have often been used alternatively. This study aimed to explore the possibility of constructing machine learning models for predicting in vivo fm using chemical structure information alone. We collected in vivo fm values and chemical structures of 319 compounds from a public database with careful manual curation and constructed predictive models using several machine learning methods. The results showed that in vivo fm values can be obtained from structural information alone with a performance comparable to that based on in vitro experimental values and that the prediction accuracy for the compounds involved in CYP induction or inhibition is significantly higher than that by using in vitro values. Our new approach to predicting in vivo fm values in the early stages of drug discovery should help improve the efficiency of the drug optimization process.
Collapse
Affiliation(s)
- Reiko Watanabe
- Artificial
Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health
and Nutrition, Osaka 567-0085, Japan
- Institute
for Protein Research, Osaka University, Osaka 567-0085, Japan
| | - Toshio Kawata
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Shinya Ueda
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Takumi Shinbo
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Mitsuo Higashimori
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Yayoi Natsume-Kitatani
- Artificial
Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health
and Nutrition, Osaka 567-0085, Japan
- Institute
of Advanced Medical Sciences, Tokushima
University, Tokushima 567-0085, Japan
| | - Kenji Mizuguchi
- Artificial
Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health
and Nutrition, Osaka 567-0085, Japan
- Institute
for Protein Research, Osaka University, Osaka 567-0085, Japan
| |
Collapse
|
23
|
Kong W, Huang W, Peng C, Zhang B, Duan G, Ma W, Huang Z. Multiple machine learning methods aided virtual screening of Na V 1.5 inhibitors. J Cell Mol Med 2022; 27:266-276. [PMID: 36573431 PMCID: PMC9843531 DOI: 10.1111/jcmm.17652] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/30/2022] [Accepted: 12/06/2022] [Indexed: 12/28/2022] Open
Abstract
Nav 1.5 sodium channels contribute to the generation of the rapid upstroke of the myocardial action potential and thereby play a central role in the excitability of myocardial cells. At present, the patch clamp method is the gold standard for ion channel inhibitor screening. However, this method has disadvantages such as high technical difficulty, high cost and low speed. In this study, novel machine learning models to screen chemical blockers were developed to overcome the above shortage. The data from the ChEMBL Database were employed to establish the machine learning models. Firstly, six molecular fingerprints together with five machine learning algorithms were used to develop 30 classification models to predict effective inhibitors. A validation and a test set were used to evaluate the performance of the models. Subsequently, the privileged substructures tightly associated with the inhibition of the Nav 1.5 ion channel were extracted using the bioalerts Python package. In the validation set, the RF-Graph model performed best. Similarly, RF-Graph produced the best result in the test set in which the Prediction Accuracy (Q) was 0.9309 and Matthew's correlation coefficient was 0.8627, further indicating the model had high classification ability. The results of the privileged substructures indicated Sulfa structures and fragments with large Steric hindrance tend to block Nav 1.5. In the unsupervised learning task of identifying sulfa drugs, MACCS and Graph fingerprints had good results. In summary, effective machine learning models have been constructed which help to screen potential inhibitors of the Nav 1.5 ion channel and key privileged substructures with high affinity were also extracted.
Collapse
Affiliation(s)
- Weikaixin Kong
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina,Institute for Molecular Medicine Finland (FIMM)HiLIFE, University of HelsinkiHelsinkiFinland,Institute Sanqu Technology (Hangzhou) Co., Ltd.HangzhouChina
| | - Weiran Huang
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Chao Peng
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Bowen Zhang
- ComMedX (Computational Medicine Beijing Co., Ltd.)BeijingChina
| | - Guifang Duan
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Weining Ma
- Department of NeurologyShengjing Hospital affiliated to China Medical UniversityShenyangChina
| | - Zhuo Huang
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina,State Key Laboratory of Natural and Biomimetic Drugs, Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| |
Collapse
|
24
|
Combining machine‐learning and molecular‐modeling methods for drug‐target affinity predictions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|