1
|
Ree N, Wollschläger JM, Göller AH, Jensen JH. Atom-based machine learning for estimating nucleophilicity and electrophilicity with applications to retrosynthesis and chemical stability. Chem Sci 2025; 16:5676-5687. [PMID: 40041802 PMCID: PMC11875096 DOI: 10.1039/d4sc07297a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Accepted: 02/23/2025] [Indexed: 03/28/2025] Open
Abstract
Nucleophilicity and electrophilicity are important properties for evaluating the reactivity and selectivity of chemical reactions. It allows the ranking of nucleophiles and electrophiles on reactivity scales, enabling a better understanding and prediction of reaction outcomes. Building upon our recent work (N. Ree, A. H. Göller and J. H. Jensen, Automated quantum chemistry for estimating nucleophilicity and electrophilicity with applications to retrosynthesis and covalent inhibitors, Digit. Discov., 2024, 3, 347-354), we introduce an atom-based machine learning (ML) approach for predicting methyl cation affinities (MCAs) and methyl anion affinities (MAAs) to estimate nucleophilicity and electrophilicity, respectively. The ML models are trained and validated on QM-derived data from around 50 000 neutral drug-like molecules, achieving Pearson correlation coefficients of 0.97 for MCA and 0.95 for MAA on the held-out test sets. In addition, we demonstrate the ML approach on two different applications: first, as a general tool for filtering retrosynthetic routes based on chemical selectivity predictions, and second, as a tool for assessing the chemical stability of esters and carbamates towards hydrolysis reactions. The code is freely available on GitHub under the MIT open source license and as a web application at https://www.esnuel.org.
Collapse
Affiliation(s)
- Nicolai Ree
- Department of Chemistry, University of Copenhagen Universitetsparken 5 2100 Copenhagen Ø Denmark
| | - Jan M Wollschläger
- Bayer AG, Pharmaceuticals, R&D, Machine Learning Research 13353 Berlin Germany
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design 42096 Wuppertal Germany
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen Universitetsparken 5 2100 Copenhagen Ø Denmark
| |
Collapse
|
2
|
Liu Z, Lin Y, He Q, Dai L, Tan Q, Jin B, Lee PW, Zhang X, Zhang L. Atom-Driven and Knowledge-Based Hydrolysis Metabolite Assessment for Environmental Organic Chemicals. Molecules 2025; 30:234. [PMID: 39860105 PMCID: PMC11767695 DOI: 10.3390/molecules30020234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 01/03/2025] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
The metabolism of environmental organic chemicals often relies on the catalytic action of specific enzymes at the nanoscale, which is critical for assessing their environmental impact, safety, and efficacy. Hydrolysis is one of the primary metabolic and degradation reaction pathways. Traditionally, hydrolysis product identification has relied on experimental methods that are both time-consuming and costly. In this study, machine-learning-based atomic-driven models were constructed to predict the hydrolysis reactions for environmental organic chemicals, including four main hydrolysis sites: N-Hydrolysis, O-Hydrolysis, C-Hydrolysis, and Global-Hydrolysis. These machine learning models were further integrated with a knowledge-based expert system to create a global hydrolysis model, which utilizes predicted hydrolysis site probabilities to prioritize potential hydrolysis products. For an external test set of 75 chemicals, the global hydrolysis site prediction model achieved an accuracy of 93%. Additionally, among 99 experimental hydrolysis products, our model successfully predicted 90, with a hit rate of 90%. This model offers significant potential for identifying hydrolysis metabolites in environmental organic chemicals.
Collapse
Affiliation(s)
- Zhe Liu
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Yufan Lin
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Qi He
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Lingjie Dai
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Qinyan Tan
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Binyan Jin
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Philip W. Lee
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Xiaoming Zhang
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
| | - Li Zhang
- Innovation Center of Pesticide Research, Department of Applied Chemistry, College of Science, China Agricultural University, Beijing 100193, China
- Key Laboratory of National Forestry and Grassland Administration on Pest Chemical Control, China Agricultural University, Beijing 100193, China
| |
Collapse
|
3
|
Vyas SK, Das A, Suryanarayana Murty U, Dixit VA. Sulfotransferase-mediated phase II drug metabolism prediction of substrates and sites using accessibility and reactivity-based algorithms. Mol Inform 2024; 43:e202400008. [PMID: 39110066 DOI: 10.1002/minf.202400008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/18/2024] [Accepted: 06/24/2024] [Indexed: 10/16/2024]
Abstract
Sulphotransferases (SULTs) are a major phase II metabolic enzyme class contributing ~20 % to the Phase II metabolism of FDA-approved drugs. Ignoring the potential for SULT-mediated metabolism leaves a strong potential for drug-drug interactions, often causing late-stage drug discovery failures or black-boxed warnings on FDA labels. The existing models use only accessibility descriptors and machine learning (ML) methods for class and site of sulfonation (SOS) predictions for SULT. In this study, a variety of accessibility, reactivity, and hybrid models and algorithms have been developed to make accurate substrate and SOS predictions. Unlike the literature models, reactivity parameters for the aliphatic or aromatic hydroxyl groups (R/Ar-O-H), the Bond Dissociation Energy (BDE) gave accurate models with a True Positive Rate (TPR)=0.84 for SOS predictions. We offer mechanistic insights to explain these novel findings that are not recognized in the literature. The accessibility parameters like the ratio of Chemgauss4 Score (CGS) and Molecular Weight (MW) CGS/MW and distance from cofactor (Dis) were essential for class predictions and showed TPR=0.72. Substrates consistently had lower BDE, Dis, and CGS/MW than non-substrates. Hybrid models also performed acceptablely for SOS predictions. Using the best models, Algorithms gave an acceptable performance in class prediction: TPR=0.62, False Positive Rate (FPR)=0.24, Balanced accuracy (BA)=0.69, and SOS prediction: TPR=0.98, FPR=0.60, and BA=0.69. A rule-based method was added to improve the predictive performance, which improved the algorithm TPR, FPR, and BA. Validation using an external dataset of drug-like compounds gave class prediction: TPR=0.67, FPR=0.00, and SOS prediction: TPR=0.80 and FPR=0.44 for the best Algorithm. Comparisons with standard ML models also show that our algorithm shows higher predictive performance for classification on external datasets. Overall, these models and algorithms (SOS predictor) give accurate substrate class and site (SOS) predictions for SULT-mediated Phase II metabolism and will be valuable to the drug discovery community in academia and industry. The SOS predictor is freely available for academic/non-profit research via the GitHub link.
Collapse
Affiliation(s)
- Shivam Kumar Vyas
- Department of Medicinal Chemistry, Department of Pharmaceuticals, Ministry of Chemicals & Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), P.O.: Changsari, Dist: Kamrup, Pin, National Institute of Pharmaceutical Education and Research, Guwahati, (NIPER Guwahati), Guwahati, Assam, 781101, India
| | - Avik Das
- Department of Pharmacy, Birla Institute of Technology and Sciences Pilani (BITS-Pilani), Vidya Vihar Campus 41, Pilani, Rajasthan, 333031, India
- Current address: Department of Primary Intelligence, IQVIA, Sarjapur-Marathahalli Outer Ring Road Embassy Tech Square, Bangalore, 560103 Karnataka, India
| | - Upadhyayula Suryanarayana Murty
- Department of Medicinal Chemistry, Department of Pharmaceuticals, Ministry of Chemicals & Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), P.O.: Changsari, Dist: Kamrup, Pin, National Institute of Pharmaceutical Education and Research, Guwahati, (NIPER Guwahati), Guwahati, Assam, 781101, India
| | - Vaibhav A Dixit
- Department of Medicinal Chemistry, Department of Pharmaceuticals, Ministry of Chemicals & Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), P.O.: Changsari, Dist: Kamrup, Pin, National Institute of Pharmaceutical Education and Research, Guwahati, (NIPER Guwahati), Guwahati, Assam, 781101, India
| |
Collapse
|
4
|
Mao W, Zhou T, Zhang F, Qian M, Xie J, Li Z, Shu Y, Li Y, Xu H. Pan-cancer single-cell landscape of drug-metabolizing enzyme genes. Pharmacogenet Genomics 2024; 34:217-225. [PMID: 38814173 DOI: 10.1097/fpc.0000000000000538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
OBJECTIVE Varied expression of drug-metabolizing enzymes (DME) genes dictates the intensity and duration of drug response in cancer treatment. This study aimed to investigate the transcriptional profile of DMEs in tumor microenvironment (TME) at single-cell level and their impact on individual responses to anticancer therapy. METHODS Over 1.3 million cells from 481 normal/tumor samples across 9 solid cancer types were integrated to profile changes in the expression of DME genes. A ridge regression model based on the PRISM database was constructed to predict the influence of DME gene expression on drug sensitivity. RESULTS Distinct expression patterns of DME genes were revealed at single-cell resolution across different cancer types. Several DME genes were highly enriched in epithelial cells (e.g. GPX2, TST and CYP3A5 ) or different TME components (e.g. CYP4F3 in monocytes). Particularly, GPX2 and TST were differentially expressed in epithelial cells from tumor samples compared to those from normal samples. Utilizing the PRISM database, we found that elevated expression of GPX2, CYP3A5 and reduced expression of TST was linked to enhanced sensitivity of particular chemo-drugs (e.g. gemcitabine, daunorubicin, dasatinib, vincristine, paclitaxel and oxaliplatin). CONCLUSION Our findings underscore the varied expression pattern of DME genes in cancer cells and TME components, highlighting their potential as biomarkers for selecting appropriate chemotherapy agents.
Collapse
Affiliation(s)
- Wei Mao
- Department of Laboratory Medicine/Research Centre of Clinical Laboratory Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan
| | - Tao Zhou
- Department of Laboratory Medicine/Research Centre of Clinical Laboratory Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan
| | - Feng Zhang
- Center for Precision Medicine, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, Zhejiang
| | - Maoxiang Qian
- Institute of Pediatrics and Department of Hematology and Oncology, National Children's Medical Center, Children's Hospital of Fudan University, Shanghai
| | - Jianqiang Xie
- Department of Medicine and Surgery, Sichan Second Veterans Hospital
| | - Zhengyan Li
- Department of Radiology, West China Hospital, Sichuan University
| | - Yang Shu
- Gastric Cancer Center, West China Hospital, Sichuan University
| | - Yuan Li
- Institute of Digestive Surgery, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Heng Xu
- Department of Laboratory Medicine/Research Centre of Clinical Laboratory Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan
| |
Collapse
|
5
|
Borup RM, Ree N, Jensen JH. pKalculator: A p K a predictor for C-H bonds. Beilstein J Org Chem 2024; 20:1614-1622. [PMID: 39076289 PMCID: PMC11285060 DOI: 10.3762/bjoc.20.144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 07/02/2024] [Indexed: 07/31/2024] Open
Abstract
Determining the pK a values of various C-H sites in organic molecules offers valuable insights for synthetic chemists in predicting reaction sites. As molecular complexity increases, this task becomes more challenging. This paper introduces pKalculator, a quantum chemistry (QM)-based workflow for automatic computations of C-H pK a values, which is used to generate a training dataset for a machine learning (ML) model. The QM workflow is benchmarked against 695 experimentally determined C-H pK a values in DMSO. The ML model is trained on a diverse dataset of 775 molecules with 3910 C-H sites. Our ML model predicts C-H pK a values with a mean absolute error (MAE) and a root mean squared error (RMSE) of 1.24 and 2.15 pK a units, respectively. Furthermore, we employ our model on 1043 pK a-dependent reactions (aldol, Claisen, and Michael) and successfully indicate the reaction sites with a Matthew's correlation coefficient (MCC) of 0.82.
Collapse
Affiliation(s)
- Rasmus M Borup
- Department of Chemistry, University of Copenhagen, Copenhagen, DK-2100, Denmark
| | - Nicolai Ree
- Department of Chemistry, University of Copenhagen, Copenhagen, DK-2100, Denmark
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, DK-2100, Denmark
| |
Collapse
|
6
|
King-Smith E, Faber FA, Reilly U, Sinitskiy AV, Yang Q, Liu B, Hyek D, Lee AA. Predictive Minisci late stage functionalization with transfer learning. Nat Commun 2024; 15:426. [PMID: 38225239 PMCID: PMC10789750 DOI: 10.1038/s41467-023-42145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/01/2023] [Indexed: 01/17/2024] Open
Abstract
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
Collapse
Affiliation(s)
- Emma King-Smith
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Felix A Faber
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Usa Reilly
- Development & Medical, Pfizer Worldwide Research, Groton, CT, USA
| | - Anton V Sinitskiy
- Machine Learning Computational Sciences, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Qingyi Yang
- Development & Medical, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Bo Liu
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Dennis Hyek
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
7
|
Smith AME, Lanevskij K, Sazonovas A, Harris J. Impact of Established and Emerging Software Tools on the Metabolite Identification Landscape. FRONTIERS IN TOXICOLOGY 2022; 4:932445. [PMID: 35800176 PMCID: PMC9253584 DOI: 10.3389/ftox.2022.932445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 05/30/2022] [Indexed: 11/25/2022] Open
Abstract
Scientists' ability to detect drug-related metabolites at trace concentrations has improved over recent decades. High-resolution instruments enable collection of large amounts of raw experimental data. In fact, the quantity of data produced has become a challenge due to effort required to convert raw data into useful insights. Various cheminformatics tools have been developed to address these metabolite identification challenges. This article describes the current state of these tools. They can be split into two categories: Pre-experimental metabolite generation and post-experimental data analysis. The former can be subdivided into rule-based, machine learning-based, and docking-based approaches. Post-experimental tools help scientists automatically perform chromatographic deconvolution of LC/MS data and identify metabolites. They can use pre-experimental predictions to improve metabolite identification, but they are not limited to these predictions: unexpected metabolites can also be discovered through fractional mass filtering. In addition to a review of available software tools, we present a description of pre-experimental and post-experimental metabolite structure generation using MetaSense. These software tools improve upon manual techniques, increasing scientist productivity and enabling efficient handling of large datasets. However, the trend of increasingly large datasets and highly data-driven workflows requires a more sophisticated informatics transition in metabolite identification labs. Experimental work has traditionally been separated from the information technology tools that handle our data. We argue that these IT tools can help scientists draw connections via data visualizations and preserve and share results via searchable centralized databases. In addition, data marshalling and homogenization techniques enable future data mining and machine learning.
Collapse
|
8
|
Ertl P, Gerebtzoff G, Lewis RA, Muenkler H, Schneider N, Sirockin F, Stiefl N, Tosco P. Chemical reactivity prediction: current methods and different application areas. Mol Inform 2021; 41:e2100277. [PMID: 34964302 DOI: 10.1002/minf.202100277] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 12/28/2021] [Indexed: 11/10/2022]
Abstract
The ability to predict chemical reactivity of a molecule is highly desirable in drug discovery, both ex vivo (synthetic route planning, formulation, stability) and in vivo: metabolic reactions determine pharmacodynamics, pharmacokinetics and potential toxic effects, and early assessment of liabilities is vital to reduce attrition rates in later stages of development. Quantum mechanics offer a precise description of the interactions between electrons and orbitals in the breaking and forming of new bonds. Modern algorithms and faster computers have allowed the study of more complex systems in a punctual and accurate fashion, and answers for chemical questions around stability and reactivity can now be provided. Through machine learning, predictive models can be built out of descriptors derived from quantum mechanics and cheminformatics, even in the absence of experimental data to train on. In this article, current progress on computational reactivity prediction is reviewed: applications to problems in drug design, such as modelling of metabolism and covalent inhibition, are highlighted and unmet challenges are posed.
Collapse
Affiliation(s)
| | | | - Richard A Lewis
- Computer-Aided Drug Design, Eli Lilly and Company Limited, Windlesham, SWITZERLAND
| | - Hagen Muenkler
- Novartis Institutes for BioMedical Research Inc, SWITZERLAND
| | | | | | | | - Paolo Tosco
- Novartis Institutes for BioMedical Research Inc, SWITZERLAND
| |
Collapse
|
9
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
10
|
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints. Methods Mol Biol 2021. [PMID: 34731464 DOI: 10.1007/978-1-0716-1787-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.
Collapse
|
11
|
Matveieva M, Polishchuk P. Benchmarks for interpretation of QSAR models. J Cheminform 2021; 13:41. [PMID: 34039411 PMCID: PMC8157407 DOI: 10.1186/s13321-021-00519-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 05/15/2021] [Indexed: 01/06/2023] Open
Abstract
Interpretation of QSAR models is useful to understand the complex nature of biological or physicochemical processes, guide structural optimization or perform knowledge-based validation of QSAR models. Highly predictive models are usually complex and their interpretation is non-trivial. This is particularly true for modern neural networks. Various approaches to interpretation of these models exist. However, it is difficult to evaluate and compare performance and applicability of these ever-emerging methods. Herein, we developed several benchmark data sets with end-points determined by pre-defined patterns. These data sets are purposed for evaluation of the ability of interpretation approaches to retrieve these patterns. They represent tasks with different complexity levels: from simple atom-based additive properties to pharmacophore hypothesis. We proposed several quantitative metrics of interpretation performance. Applicability of benchmarks and metrics was demonstrated on a set of conventional models and end-to-end graph convolutional neural networks, interpreted by the previously suggested universal ML-agnostic approach for structural interpretation. We anticipate these benchmarks to be useful in evaluation of new interpretation approaches and investigation of decision making of complex "black box" models.
Collapse
Affiliation(s)
- Mariia Matveieva
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic.
| |
Collapse
|
12
|
Abiri R, Atabaki N, Sanusi R, Malik S, Abiri R, Safa P, Shukor NAA, Abdul-Hamid H. New Insights into the Biological Properties of Eucalyptus-Derived Essential Oil: A Promising Green Anti-Cancer Drug. FOOD REVIEWS INTERNATIONAL 2021. [DOI: 10.1080/87559129.2021.1877300] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Rambod Abiri
- Department of Forestry Science and Biodiversity, Faculty of Forestry and Environment, Universiti Putra Malaysia, Serdang, Selangor DE 43400 UPM, Malaysia
| | - Narges Atabaki
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor DE 43400 UPM, Malaysia
| | - Ruzana Sanusi
- Department of Forestry Science and Biodiversity, Faculty of Forestry and Environment, Universiti Putra Malaysia, Serdang, Selangor DE 43400 UPM, Malaysia
- Laboratory of Bioresource Management, Institute of Tropical Forestry and Forest Products (INTROP), Universiti Putra Malaysia, Serdang DE 43400 UPM, Malaysia
| | - Sonia Malik
- Health Science Graduate Program, Biological & Health Sciences Centre, Federal University of Maranhao, Sao Luis, MA, Brazil
| | - Ramin Abiri
- Department of Medical Microbiology, School of Medicine, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Parastoo Safa
- Department of Human Anatomy, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Serdang, Selangor DE 43400 UPM, Malaysia
| | - Nor Aini Ab Shukor
- Department of Forestry Science and Biodiversity, Faculty of Forestry and Environment, Universiti Putra Malaysia, Serdang, Selangor DE 43400 UPM, Malaysia
- Laboratory of Bioresource Management, Institute of Tropical Forestry and Forest Products (INTROP), Universiti Putra Malaysia, Serdang DE 43400 UPM, Malaysia
| | - Hazandy Abdul-Hamid
- Department of Forestry Science and Biodiversity, Faculty of Forestry and Environment, Universiti Putra Malaysia, Serdang, Selangor DE 43400 UPM, Malaysia
- Laboratory of Bioresource Management, Institute of Tropical Forestry and Forest Products (INTROP), Universiti Putra Malaysia, Serdang DE 43400 UPM, Malaysia
| |
Collapse
|
13
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
14
|
Göller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, Ter Laak A, Wichard J, Lobell M, Hillisch A. Bayer's in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 2020; 25:1702-1709. [PMID: 32652309 DOI: 10.1016/j.drudis.2020.07.001] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 06/16/2020] [Accepted: 07/02/2020] [Indexed: 12/20/2022]
Abstract
Over the past two decades, an in silico absorption, distribution, metabolism, and excretion (ADMET) platform has been created at Bayer Pharma with the goal to generate models for a variety of pharmacokinetic and physicochemical endpoints in early drug discovery. These tools are accessible to all scientists within the company and can be a useful in assisting with the selection and design of novel leads, as well as the process of lead optimization. Here. we discuss the development of machine-learning (ML) approaches with special emphasis on data, descriptors, and algorithms. We show that high company internal data quality and tailored descriptors, as well as a thorough understanding of the experimental endpoints, are essential to the utility of our models. We discuss the recent impact of deep neural networks and show selected application examples.
Collapse
Affiliation(s)
- Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096 Wuppertal, Germany
| | - Lara Kuhnke
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 13342 Berlin, Germany
| | - Floriane Montanari
- Bayer AG, Pharmaceuticals, R&D, Machine Learning Research, 13342 Berlin, Germany
| | - Anne Bonin
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096 Wuppertal, Germany
| | | | - Antonius Ter Laak
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 13342 Berlin, Germany
| | - Jörg Wichard
- Bayer AG, Pharmaceuticals, R&D, Genetic Toxicology, 13342 Berlin, Germany
| | - Mario Lobell
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096 Wuppertal, Germany
| | - Alexander Hillisch
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096 Wuppertal, Germany.
| |
Collapse
|
15
|
Göller AH. The art of atom descriptor design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:37-43. [PMID: 33386093 DOI: 10.1016/j.ddtec.2020.06.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 06/19/2020] [Accepted: 06/19/2020] [Indexed: 02/03/2023]
Abstract
This review provides an overview of descriptions of atoms applied to the understanding of phenomena like chemical reactivity and selectivity, pKa values, Site of Metabolism prediction, or hydrogen bond strengths, but also the substitution of quantum mechanical calculations by machine learning models for energies, forces or even spectrosocopic properties and finally the fast calculation of atomic charges for force field parametrization. The descriptor space ranges from derivatives of the wavefunctions or electron density via quantum mechanics derived descriptors to classical descriptions of atoms and their embedding in a molecule. The common denominator for all approaches is the thorough understanding of the physics of the chemical problem that guided the design of the atom descriptor. Quantum mechanics (QM) and machine learning (ML) finally are converging to a new discipline, namely QM/ML.
Collapse
Affiliation(s)
- Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Digital Technologies, Computational Molecular Design, 42096 Wuppertal, Germany.
| |
Collapse
|
16
|
Bauer CA, Schneider G, Göller AH. Machine learning models for hydrogen bond donor and acceptor strengths using large and diverse training data generated by first-principles interaction free energies. J Cheminform 2019; 11:59. [PMID: 33430967 PMCID: PMC6737620 DOI: 10.1186/s13321-019-0381-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 08/10/2019] [Indexed: 02/06/2023] Open
Abstract
We present machine learning (ML) models for hydrogen bond acceptor (HBA) and hydrogen bond donor (HBD) strengths. Quantum chemical (QC) free energies in solution for 1:1 hydrogen-bonded complex formation to the reference molecules 4-fluorophenol and acetone serve as our target values. Our acceptor and donor databases are the largest on record with 4426 and 1036 data points, respectively. After scanning over radial atomic descriptors and ML methods, our final trained HBA and HBD ML models achieve RMSEs of 3.8 kJ mol-1 (acceptors), and 2.3 kJ mol-1 (donors) on experimental test sets, respectively. This performance is comparable with previous models that are trained on experimental hydrogen bonding free energies, indicating that molecular QC data can serve as substitute for experiment. The potential ramifications thereof could lead to a full replacement of wetlab chemistry for HBA/HBD strength determination by QC. As a possible chemical application of our ML models, we highlight our predicted HBA and HBD strengths as possible descriptors in two case studies on trends in intramolecular hydrogen bonding.
Collapse
Affiliation(s)
- Christoph A Bauer
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), 8093, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), 8093, Zurich, Switzerland.
| | | |
Collapse
|
17
|
Šícho M, Stork C, Mazzolari A, de Bruyn Kops C, Pedretti A, Testa B, Vistoli G, Svozil D, Kirchmair J. FAME 3: Predicting the Sites of Metabolism in Synthetic Compounds and Natural Products for Phase 1 and Phase 2 Metabolic Enzymes. J Chem Inf Model 2019; 59:3400-3412. [PMID: 31361490 DOI: 10.1021/acs.jcim.9b00376] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedretti et al. J. Med. Chem. 2018 , 61 , 1019 ), a recently published data resource on xenobiotic metabolism that contains more than 2100 substrates annotated with more than 6300 experimentally confirmed SoMs related to redox reactions, hydrolysis and other nonredox reactions, and conjugation reactions. In tests with holdout data, FAME 3 models reached competitive performance, with Matthews correlation coefficients (MCCs) ranging from 0.50 for a global model covering phase 1 and phase 2 metabolism, to 0.75 for a focused model for phase 2 metabolism. A model focused on cytochrome P450 metabolism yielded an MCC of 0.57. Results from case studies with several synthetic compounds, natural products, and natural product derivatives demonstrate the agreement between model predictions and literature data even for molecules with structural patterns clearly distinct from those present in the training data. The applicability domains of the individual models were estimated by a new, atom-based distance measure (FAMEscore) that is based on a nearest-neighbor search in the space of atom environments. FAME 3 is available via a public web service at https://nerdd.zbh.uni-hamburg.de/ and as a self-contained Java software package, free for academic and noncommercial research.
Collapse
Affiliation(s)
- Martin Šícho
- Faculty of Mathematics, Informatics and Natural Sciences, Department of Informatics, Center for Bioinformatics , Universität Hamburg , 20146 Hamburg , Germany.,Faculty of Chemical Technology, Department of Informatics and Chemistry, CZ-OPENSCREEN: National Infrastructure for Chemical Biology , University of Chemistry and Technology Prague , 166 28 Prague 6 , Czech Republic
| | - Conrad Stork
- Faculty of Mathematics, Informatics and Natural Sciences, Department of Informatics, Center for Bioinformatics , Universität Hamburg , 20146 Hamburg , Germany
| | - Angelica Mazzolari
- Facoltà di Scienze del Farmaco, Dipartimento di Scienze Farmaceutiche "Pietro Pratesi" , Università degli Studi di Milano , I-20133 Milan , Italy
| | - Christina de Bruyn Kops
- Faculty of Mathematics, Informatics and Natural Sciences, Department of Informatics, Center for Bioinformatics , Universität Hamburg , 20146 Hamburg , Germany
| | - Alessandro Pedretti
- Facoltà di Scienze del Farmaco, Dipartimento di Scienze Farmaceutiche "Pietro Pratesi" , Università degli Studi di Milano , I-20133 Milan , Italy
| | - Bernard Testa
- University of Lausanne , 1015 Lausanne , Switzerland
| | - Giulio Vistoli
- Facoltà di Scienze del Farmaco, Dipartimento di Scienze Farmaceutiche "Pietro Pratesi" , Università degli Studi di Milano , I-20133 Milan , Italy
| | - Daniel Svozil
- Faculty of Chemical Technology, Department of Informatics and Chemistry, CZ-OPENSCREEN: National Infrastructure for Chemical Biology , University of Chemistry and Technology Prague , 166 28 Prague 6 , Czech Republic
| | - Johannes Kirchmair
- Faculty of Mathematics, Informatics and Natural Sciences, Department of Informatics, Center for Bioinformatics , Universität Hamburg , 20146 Hamburg , Germany
| |
Collapse
|
18
|
Kaitoh K, Kotera M, Funatsu K. Novel Electrotopological Atomic Descriptors for the Prediction of Xenobiotic Cytochrome P450 Reactions. Mol Inform 2019; 38:e1900010. [PMID: 31187601 DOI: 10.1002/minf.201900010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 04/28/2019] [Indexed: 01/06/2023]
Abstract
Cytochrome P450 (CYP) is an enzyme family that plays a crucial role in metabolism, mainly metabolizing xenobiotics to produce non-toxic structures, however, some metabolized products can cause hepatotoxicity. Hence, predicting the structures of CYP products is an important task in designing non-hepatotoxic drugs. Here, we have developed novel atomic descriptors to predict the sites of metabolism (SoM) in CYP substrates. We proposed descriptors that describe topological and electrostatic characteristics of CYP substrates using Gasteiger charge. The proposed descriptors were applied to CYP3A4 data analysis as a case study. As a result of the descriptor selection, we obtained a gradient boosting decision tree-based SoM classification model that used 139 existing descriptors and the proposed 45 descriptors, and the model performed well in terms of the Matthews correlation coefficient. We also developed a structure converter to predict CYP products. This converter correctly generated 51 structural formulas of experimentally observed CYP3A4 products according to a manual evaluation.
Collapse
Affiliation(s)
- Kazuma Kaitoh
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Masaaki Kotera
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
19
|
Cronin MT, Madden JC, Yang C, Worth AP. Unlocking the potential of in silico chemical safety assessment - A report on a cross-sector symposium on current opportunities and future challenges. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2019; 10:38-43. [PMID: 31218266 PMCID: PMC6559213 DOI: 10.1016/j.comtox.2018.12.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 12/17/2018] [Indexed: 12/21/2022]
Abstract
In silico chemical safety assessment can support the evaluation of hazard and risk following potential exposure to a substance. A symposium identified a number of opportunities and challenges to implement in silico methods, such as quantitative structure-activity relationships (QSARs) and read-across, to assess the potential harm of a substance in a variety of exposure scenarios, e.g. pharmaceuticals, personal care products, and industrial chemicals. To initiate the process of in silico safety assessment, clear and unambiguous problem formulation is required to provide the context for these methods. These approaches must be built on data of defined quality, while acknowledging the possibility of novel data resources tapping into on-going progress with data sharing. Models need to be developed that cover appropriate toxicity and kinetic endpoints, and that are documented appropriately with defined uncertainties. The application and implementation of in silico models in chemical safety requires a flexible technological framework that enables the integration of multiple strands of data and evidence. The findings of the symposium allowed for the identification of priorities to progress in silico chemical safety assessment towards the animal-free assessment of chemicals.
Collapse
Affiliation(s)
- Mark T.D. Cronin
- Liverpool John Moores University, School of Pharmacy and Biomolecular Sciences, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Judith C. Madden
- Liverpool John Moores University, School of Pharmacy and Biomolecular Sciences, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Chihae Yang
- Molecular Networks GmbH, Neumeyerstraße 28, 90411 Nürnberg, Germany
| | - Andrew P. Worth
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| |
Collapse
|
20
|
Tyzack JD, Kirchmair J. Computational methods and tools to predict cytochrome P450 metabolism for drug discovery. Chem Biol Drug Des 2019; 93:377-386. [PMID: 30471192 PMCID: PMC6590657 DOI: 10.1111/cbdd.13445] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 11/05/2018] [Accepted: 11/11/2018] [Indexed: 01/08/2023]
Abstract
In this review, we present important, recent developments in the computational prediction of cytochrome P450 (CYP) metabolism in the context of drug discovery. We discuss in silico models for the various aspects of CYP metabolism prediction, including CYP substrate and inhibitor predictors, site of metabolism predictors (i.e., metabolically labile sites within potential substrates) and metabolite structure predictors. We summarize the different approaches taken by these models, such as rule‐based methods, machine learning, data mining, quantum chemical methods, molecular interaction fields, and docking. We highlight the scope and limitations of each method and discuss future implications for the field of metabolism prediction in drug discovery.
Collapse
Affiliation(s)
| | - Johannes Kirchmair
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
21
|
Bauer CA, Schneider G, Göller AH. Gaussian Process Regression Models for the Prediction of Hydrogen Bond Acceptor Strengths. Mol Inform 2018; 38:e1800115. [DOI: 10.1002/minf.201800115] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/16/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Christoph A. Bauer
- Swiss Federal Institute of Technology (ETH)Department of Chemistry and Applied Biosciences 8093 Zurich Switzerland
| | - Gisbert Schneider
- Swiss Federal Institute of Technology (ETH)Department of Chemistry and Applied Biosciences 8093 Zurich Switzerland
| | | |
Collapse
|