1
|
Humer C, Nicholls R, Heberle H, Heckmann M, Pühringer M, Wolf T, Lübbesmeyer M, Heinrich J, Hillenbrand J, Volpin G, Streit M. CIME4R: Exploring iterative, AI-guided chemical reaction optimization campaigns in their parameter space. J Cheminform 2024; 16:51. [PMID: 38730469 DOI: 10.1186/s13321-024-00840-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024] Open
Abstract
Chemical reaction optimization (RO) is an iterative process that results in large, high-dimensional datasets. Current tools allow for only limited analysis and understanding of parameter spaces, making it hard for scientists to review or follow changes throughout the process. With the recent emergence of using artificial intelligence (AI) models to aid RO, another level of complexity has been added. Helping to assess the quality of a model's prediction and understand its decision is critical to supporting human-AI collaboration and trust calibration. To address this, we propose CIME4R-an open-source interactive web application for analyzing RO data and AI predictions. CIME4R supports users in (i) comprehending a reaction parameter space, (ii) investigating how an RO process developed over iterations, (iii) identifying critical factors of a reaction, and (iv) understanding model predictions. This facilitates making informed decisions during the RO process and helps users to review a completed RO process, especially in AI-guided RO. CIME4R aids decision-making through the interaction between humans and AI by combining the strengths of expert experience and high computational precision. We developed and tested CIME4R with domain experts and verified its usefulness in three case studies. Using CIME4R the experts were able to produce valuable insights from past RO campaigns and to make informed decisions on which experiments to perform next. We believe that CIME4R is the beginning of an open-source community project with the potential to improve the workflow of scientists working in the reaction optimization domain. SCIENTIFIC CONTRIBUTION: To the best of our knowledge, CIME4R is the first open-source interactive web application tailored to the peculiar analysis requirements of reaction optimization (RO) campaigns. Due to the growing use of AI in RO, we developed CIME4R with a special focus on facilitating human-AI collaboration and understanding of AI models. We developed and evaluated CIME4R in collaboration with domain experts to verify its practical usefulness.
Collapse
Affiliation(s)
| | - Rachel Nicholls
- Division Crop Science, Bayer AG, Monheim am Rhein, 40789, Germany
| | - Henry Heberle
- Division Crop Science, Bayer AG, Monheim am Rhein, 40789, Germany
| | | | | | - Thomas Wolf
- Division Crop Science, Bayer AG, Frankfurt, 65926, Germany
| | | | - Julian Heinrich
- Division Crop Science, Bayer AG, Monheim am Rhein, 40789, Germany
| | | | - Giulio Volpin
- Division Crop Science, Bayer AG, Frankfurt, 65926, Germany.
| | - Marc Streit
- Johannes Kepler University Linz, Linz, 4040, Austria.
- datavisyn GmbH, Linz, 4040, Austria.
| |
Collapse
|
2
|
Vittoria Togo M, Mastrolorito F, Orfino A, Graps EA, Tondo AR, Altomare CD, Ciriaco F, Trisciuzzi D, Nicolotti O, Amoroso N. Where developmental toxicity meets explainable artificial intelligence: state-of-the-art and perspectives. Expert Opin Drug Metab Toxicol 2023:1-17. [PMID: 38141160 DOI: 10.1080/17425255.2023.2298827] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 12/20/2023] [Indexed: 12/24/2023]
Abstract
INTRODUCTION The application of Artificial Intelligence (AI) to predictive toxicology is rapidly increasing, particularly aiming to develop non-testing methods that effectively address ethical concerns and reduce economic costs. In this context, Developmental Toxicity (Dev Tox) stands as a key human health endpoint, especially significant for safeguarding maternal and child well-being. AREAS COVERED This review outlines the existing methods employed in Dev Tox predictions and underscores the benefits of utilizing New Approach Methodologies (NAMs), specifically focusing on eXplainable Artificial Intelligence (XAI), which proves highly efficient in constructing reliable and transparent models aligned with recommendations from international regulatory bodies. EXPERT OPINION The limited availability of high-quality data and the absence of dependable Dev Tox methodologies render XAI an appealing avenue for systematically developing interpretable and transparent models, which hold immense potential for both scientific evaluations and regulatory decision-making.
Collapse
Affiliation(s)
- Maria Vittoria Togo
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Fabrizio Mastrolorito
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Angelica Orfino
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Elisabetta Anna Graps
- ARESS Puglia - Agenzia Regionale strategica per laSalute ed il Sociale, Presidenza della Regione Puglia", Bari, Italy
| | - Anna Rita Tondo
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Cosimo Damiano Altomare
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Fulvio Ciriaco
- Department of Chemistry, Universitá degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Daniela Trisciuzzi
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Orazio Nicolotti
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Nicola Amoroso
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| |
Collapse
|
3
|
Orsi M, Probst D, Schwaller P, Reymond JL. Alchemical analysis of FDA approved drugs. Digit Discov 2023; 2:1289-1296. [PMID: 38013905 PMCID: PMC10561545 DOI: 10.1039/d3dd00039g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/29/2023] [Indexed: 11/29/2023]
Abstract
Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fact that tools designed for reaction informatics also work for alchemical processes that do not obey Lavoisier's principle, such as the transmutation of lead into gold. We start by using the differential reaction fingerprint (DRFP) to create tree-maps (TMAPs) representing the chemical space of pairs of drugs selected as being similar according to various molecular fingerprints. We then use the Transformer-based RXNMapper model to understand structural relationships between drugs, and its confidence score to distinguish between pairs related by chemically feasible transformations and pairs related by alchemical transmutations. This analysis reveals a diversity of structural similarity relationships that are otherwise difficult to analyze simultaneously. We exemplify this approach by visualizing FDA-approved drugs, EGFR inhibitors, and polymyxin B analogs.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Daniel Probst
- Ecole Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | | | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
4
|
Amara K, Rodríguez-Pérez R, Jiménez-Luna J. Explaining compound activity predictions with a substructure-aware loss for graph neural networks. J Cheminform 2023; 15:67. [PMID: 37491407 PMCID: PMC10369817 DOI: 10.1186/s13321-023-00733-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/08/2023] [Indexed: 07/27/2023] Open
Abstract
Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach shows higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.
Collapse
Affiliation(s)
- Kenza Amara
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
- Department of Computer Science, ETH Zurich, Andreasstrasse 5, 8050 Zurich, Switzerland
| | | | - José Jiménez-Luna
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
| |
Collapse
|
5
|
Abstract
Chemists can be skeptical in using deep learning (DL) in decision making, due to the lack of interpretability in "black-box" models. Explainable artificial intelligence (XAI) is a branch of artificial intelligence (AI) which addresses this drawback by providing tools to interpret DL models and their predictions. We review the principles of XAI in the domain of chemistry and emerging methods for creating and evaluating explanations. Then, we focus on methods developed by our group and their applications in predicting solubility, blood-brain barrier permeability, and the scent of molecules. We show that XAI methods like chemical counterfactuals and descriptor explanations can explain DL predictions while giving insight into structure-property relationships. Finally, we discuss how a two-step process of developing a black-box model and explaining predictions can uncover structure-property relationships.
Collapse
Affiliation(s)
- Geemi P Wellawatte
- Department of Chemistry, University of Rochester, Rochester, New York 14627, United States
| | - Heta A Gandhi
- Department of Chemical Engineering, University of Rochester, Rochester, New York 14627, United States
| | - Aditi Seshadri
- Department of Chemical Engineering, University of Rochester, Rochester, New York 14627, United States
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester, Rochester, New York 14627, United States
| |
Collapse
|
6
|
Heberle H, Zhao L, Schmidt S, Wolf T, Heinrich J. XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J Cheminform 2023; 15:2. [PMID: 36609340 DOI: 10.1186/s13321-022-00673-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 12/17/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Explainable artificial intelligence (XAI) methods have shown increasing applicability in chemistry. In this context, visualization techniques can highlight regions of a molecule to reveal their influence over a predicted property. For this purpose, some XAI techniques calculate attribution scores associated with tokens of SMILES strings or with atoms of a molecule. While an association of a score with an atom can be directly visually represented on a molecule diagram, scores computed for SMILES non-atom tokens cannot. For instance, a substring [N+] contains 3 non-atom tokens, i.e., [, [Formula: see text], and ], and their attributions, depending on the model, are not necessarily revealing an influence of the nitrogen atom over the predicted property; for that reason, it is not possible to represent the scores on a molecule diagram. Moreover, SMILES's notation is complex, foregrounding the need for techniques to facilitate the analysis of explanations associated with their tokens. RESULTS We propose XSMILES, an interactive visualization technique, to explore explainable artificial intelligence attributions scores and support the interpretation of SMILES. Users can input any type of score attributed to atom and non-atom tokens and visualize them on top of a 2D molecule diagram coordinated with a bar chart that represents a SMILES string. We demonstrate how attributions calculated for SMILES strings can be evaluated and better interpreted through interactivity with two use cases. CONCLUSIONS Data scientists can use XSMILES to understand their models' behavior and compare multiple modeling approaches. The tool provides a set of parameters to adapt the visualization to users' needs and it can be integrated into different platforms. We believe XSMILES can support data scientists to develop, improve, and communicate their models by making it easier to identify patterns and compare attributions through interactive exploratory visualization.
Collapse
|
7
|
Parrott N, Manevski N, Olivares-Morales A. Can We Predict Clinical Pharmacokinetics of Highly Lipophilic Compounds by Integration of Machine Learning or In Vitro Data into Physiologically Based Models? A Feasibility Study Based on 12 Development Compounds. Mol Pharm 2022; 19:3858-3868. [PMID: 36150125 DOI: 10.1021/acs.molpharmaceut.2c00350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
While high lipophilicity tends to improve potency, its effects on pharmacokinetics (PK) are complex and often unfavorable. To predict clinical PK in early drug discovery, we built human physiologically based PK (PBPK) models integrating either (i) machine learning (ML)-predicted properties or (ii) discovery stage in vitro data. Our test set was composed of 12 challenging development compounds with high lipophilicity (mean calculated log P 4.2), low plasma-free fraction (50% of compounds with fu,p < 1%), and low aqueous solubility. Predictions focused on key human PK parameters, including plasma clearance (CL), volume of distribution at steady state (Vss), and oral bioavailability (%F). For predictions of CL, the ML inputs showed acceptable accuracy and slight underprediction bias [an average absolute fold error (AAFE) of 3.55; an average fold error (AFE) of 0.95]. Surprisingly, use of measured data only slightly improved accuracy but introduced an overprediction bias (AAFE = 3.35; AFE = 2.63). Predictions of Vss were more successful, with both ML (AAFE = 2.21; AFE = 0.90) and in vitro (AAFE = 2.24; AFE = 1.72) inputs showing good accuracy and moderate bias. The %F was poorly predicted using ML inputs [average absolute prediction error (AAPE) of 45%], and use of measured data for solubility and permeability improved this to 34%. Sensitivity analysis showed that predictions of CL limited the overall accuracy of human PK predictions, partly due to high nonspecific binding of lipophilic compounds, leading to uncertainty of unbound clearance. For accurate predictions of %F, solubility was the key factor. Despite current limitations, this work encourages further development of ML models and integration of their results within PBPK models to enable human PK prediction at the drug design stage, even before compounds are synthesized. Further evaluation of this approach with more diverse chemical types is warranted.
Collapse
Affiliation(s)
- Neil Parrott
- Pharmaceutical Sciences, Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070 Basel, Switzerland
| | - Nenad Manevski
- Pharmaceutical Sciences, Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070 Basel, Switzerland
| | - Andrés Olivares-Morales
- Pharmaceutical Sciences, Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070 Basel, Switzerland
| |
Collapse
|