1
|
van den Maagdenberg HW, de Mol van Otterloo J, van Hasselt JGC, van der Graaf PH, van Westen GJP. Integrating Pharmacokinetics and Quantitative Systems Pharmacology Approaches in Generative Drug Design. J Chem Inf Model 2025; 65:4783-4796. [PMID: 40343729 PMCID: PMC12117666 DOI: 10.1021/acs.jcim.5c00107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 04/09/2025] [Accepted: 04/10/2025] [Indexed: 05/11/2025]
Abstract
Integrated understanding of pharmacokinetics (PK) and pharmacodynamics (PD) is a key aspect of successful drug discovery. Yet in generative computational drug design, the focus often lies on optimizing potency. Here we integrate PK property predictions in DrugEx, a generative drug design framework and we explore the generated compounds' PD through simulations with a quantitative systems pharmacology (QSP) model. Quantitative structure-property relationship models were developed to predict molecule PK (clearance, volume of distribution and unbound fraction) and affinity for the Adenosine A2AR receptor (A2AR), a drug target in immuno-oncology. These models were used to score compounds in a reinforcement learning framework to generate molecules with a specific PK profile and high affinity for the A2AR. We predicted the expected tumor growth inhibition profiles using the QSP model for selected candidate molecules with varying PK and affinity profiles. We show that optimizing affinity to the A2AR, while minimizing or maximizing a PK property, shifts the type of molecular scaffolds that are generated. The difference in physicochemical properties of the compounds with different predicted PK parameters was found to correspond with the differences observed in the PK data set. We demonstrated the use of the QSP model by simulating the effect of a broad range of compound properties on the predicted tumor volume. In conclusion, our proposed integrated workflow incorporating affinity predictions with PKPD may provide a template for the next generation of advanced generative computational drug design.
Collapse
Affiliation(s)
| | | | - J. G. Coen van Hasselt
- Leiden
Academic Centre of Drug Research, Leiden
University, 2333, Leiden, The Netherlands
| | - Piet H. van der Graaf
- Leiden
Academic Centre of Drug Research, Leiden
University, 2333, Leiden, The Netherlands
- Certara, CT2 7FGCanterbury, U.K.
| | | |
Collapse
|
2
|
Morelli FM, Raschke M, Jungmann N, Bairlein M, García de Lomana M. Predicting in vitro assays related to liver function using probabilistic machine learning. Toxicology 2025; 516:154195. [PMID: 40398507 DOI: 10.1016/j.tox.2025.154195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2025] [Revised: 05/15/2025] [Accepted: 05/15/2025] [Indexed: 05/23/2025]
Abstract
While machine learning has gained traction in toxicological assessments, the limited data availability requires the quantification of uncertainty of in silico predictions for reliable decision-making. This study addresses the challenge of predicting the outcome of in vitro assays associated with liver function by systematically comparing various probabilistic methods. Our research fills a critical gap by integrating multiple data modalities - chemical descriptors, gene expression, and morphological profiles - into a probabilistic framework aimed at predicting in vitro assays and quantifying uncertainty. We present a comprehensive evaluation of the performance of these data modalities and describe how this framework and the in vitro assay predictions can be integrated to estimate the probability of drug-induced liver injury (DILI) occurrence. Additionally, we contribute new experimental data for reactive oxygen species generation and hepatocyte toxicity assays, providing valuable resources for future research. Our findings underscore the importance of incorporating uncertainty quantification in toxicity predictions, potentially leading to a safer drug development process and reduced reliance on animal testing.
Collapse
Affiliation(s)
- Flavio M Morelli
- R&D Machine Learning Research, Bayer AG, Pharmaceuticals Division, Berlin, Germany; Department of Mathematics and Computer Science, Free University of Berlin, Berlin, Germany.
| | - Marian Raschke
- R&D Preclinical Development, Bayer AG, Pharmaceuticals Division, Berlin, Germany
| | - Natalia Jungmann
- R&D Preclinical Development, Bayer AG, Pharmaceuticals Division, Berlin, Germany
| | - Michaela Bairlein
- R&D Preclinical Development, Bayer AG, Pharmaceuticals Division, Berlin, Germany
| | | |
Collapse
|
3
|
Liu W, Zhao Z. Scupa: single-cell unified polarization assessment of immune cells using the single-cell foundation model. Bioinformatics 2025; 41:btaf090. [PMID: 39999031 PMCID: PMC11893155 DOI: 10.1093/bioinformatics/btaf090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/15/2025] [Accepted: 02/21/2025] [Indexed: 02/27/2025] Open
Abstract
MOTIVATION Immune cells undergo cytokine-driven polarization in response to diverse stimuli, altering their transcriptional profiles and functional states. This dynamic process is central to immune responses in health and diseases, yet a systematic approach to assess cytokine-driven polarization in single-cell RNA sequencing data has been lacking. RESULTS To address this gap, we developed single-cell unified polarization assessment (Scupa), the first computational method for comprehensive immune cell polarization assessment. Scupa leverages data from the Immune Dictionary, which characterizes cytokine-driven polarization states across 14 immune cell types. By integrating cell embeddings from the single-cell foundation model Universal Cell Embeddings, Scupa effectively identifies polarized cells across different species and experimental conditions. Applications of Scupa in independent datasets demonstrated its accuracy in classifying polarized cells and further revealed distinct polarization profiles in tumor-infiltrating myeloid cells across cancers. Scupa complements conventional single-cell data analysis by providing new insights into dynamic immune cell states, and holds potential for advancing therapeutic insights, particularly in cytokine-based therapies. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/bsml320/Scupa.
Collapse
Affiliation(s)
- Wendao Liu
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, United States
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Zhongming Zhao
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, United States
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| |
Collapse
|
4
|
Boger RS, Chithrananda S, Angelopoulos AN, Yoon PH, Jordan MI, Doudna JA. Functional protein mining with conformal guarantees. Nat Commun 2025; 16:85. [PMID: 39747192 PMCID: PMC11695924 DOI: 10.1038/s41467-024-55676-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open
Abstract
Molecular structure prediction and homology detection offer promising paths to discovering protein function and evolutionary relationships. However, current approaches lack statistical reliability assurances, limiting their practical utility for selecting proteins for further experimental and in-silico characterization. To address this challenge, we introduce a statistically principled approach to protein search leveraging principles from conformal prediction, offering a framework that ensures statistical guarantees with user-specified risk and provides calibrated probabilities (rather than raw ML scores) for any protein search model. Our method (1) lets users select many biologically-relevant loss metrics (i.e. false discovery rate) and assigns reliable functional probabilities for annotating genes of unknown function; (2) achieves state-of-the-art performance in enzyme classification without training new models; and (3) robustly and rapidly pre-filters proteins for computationally intensive structural alignment algorithms. Our framework enhances the reliability of protein homology detection and enables the discovery of uncharacterized proteins with likely desirable functional properties.
Collapse
Affiliation(s)
- Ron S Boger
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute; University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
| | - Seyone Chithrananda
- Innovative Genomics Institute; University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Anastasios N Angelopoulos
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Peter H Yoon
- Innovative Genomics Institute; University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Michael I Jordan
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Jennifer A Doudna
- Innovative Genomics Institute; University of California, Berkeley, CA, USA.
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA.
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, USA.
- Gladstone Institutes, San Francisco, CA, USA.
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA, USA.
| |
Collapse
|
5
|
Tanoli Z, Schulman A, Aittokallio T. Validation guidelines for drug-target prediction methods. Expert Opin Drug Discov 2025; 20:31-45. [PMID: 39568436 DOI: 10.1080/17460441.2024.2430955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 11/14/2024] [Indexed: 11/22/2024]
Abstract
INTRODUCTION Mapping the interactions between pharmaceutical compounds and their molecular targets is a fundamental aspect of drug discovery and repurposing. Drug-target interactions are important for elucidating mechanisms of action and optimizing drug efficacy and safety profiles. Several computational methods have been developed to systematically predict drug-target interactions. However, computational and experimental validation of the drug-target predictions greatly vary across the studies. AREAS COVERED Through a PubMed query, a corpus comprising 3,286 articles on drug-target interaction prediction published within the past decade was covered. Natural language processing was used for automated abstract classification to study the evolution of computational methods, validation strategies and performance assessment metrics in the 3,286 articles. Additionally, a manual analysis of 259 studies that performed experimental validation of computational predictions revealed prevalent experimental protocols. EXPERT OPINION Starting from 2014, there has been a noticeable increase in articles focusing on drug-target interaction prediction. Docking and regression stands out as the most commonly used techniques among computational methods, and cross-validation is frequently employed as the computational validation strategy. Testing the predictions using multiple, orthogonal validation strategies is recommended and should be reported for the specific target prediction applications. Experimental validation remains relatively rare and should be performed more routinely to evaluate biological relevance of predictions.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Aron Schulman
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
6
|
Urbina F, Jones T, Harris JS, Snyder SH, Lane TR, Ekins S. Predicting the Hallucinogenic Potential of Molecules Using Artificial Intelligence. ACS Chem Neurosci 2024; 15:3078-3089. [PMID: 39092989 PMCID: PMC11338697 DOI: 10.1021/acschemneuro.4c00405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024] Open
Abstract
The development of new drugs addressing serious mental health and other disorders should avoid the psychedelic experience. Analogs of psychedelic drugs can have clinical utility and are termed "psychoplastogens". These represent promising candidates for treating opioid use disorder to reduce drug dependence, with rarely reported serious adverse effects. This drug abuse cessation is linked to the induction of neuritogenesis and increased neuroplasticity, a hallmark of psychedelic molecules, such as lysergic acid diethylamine. Some, but not all psychoplastogens may act through the G-protein coupled receptor (GPCR) 5HT2A whereas others may display very different polypharmacology making prediction of hallucinogenic potential challenging. In the process of developing tools to help design new psychoplastogens, we have used artificial intelligence in the form of machine learning classification models for predicting psychedelic effects using a published in vitro data set from PsychLight (support vector classification (SVC), area under the curve (AUC) 0.74) and in vivo human data derived from books from Shulgin and Shulgin (SVC, AUC, 0.72) with nested five-fold cross validation. We have also explored conformal predictors with ECFP6 and electrostatic descriptors in an effort to optimize them. These models have been used to predict known 5HT2A agonists to assess their potential to act as psychedelics and induce hallucinations for PsychLight (SVC, AUC 0.97) and Shulgin and Shulgin (random forest, AUC 0.71). We have tested these models with head twitch data from the mouse. This predictive capability is desirable to reliably design new psychoplastogens that lack in vivo hallucinogenic potential and help assess existing and future molecules for this potential. These efforts also provide useful insights into understanding the psychedelic structure activity relationship.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Thane Jones
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Joshua S. Harris
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Scott H. Snyder
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
7
|
Xu Y, Liaw A, Sheridan RP, Svetnik V. Development and Evaluation of Conformal Prediction Methods for Quantitative Structure-Activity Relationship. ACS OMEGA 2024; 9:29478-29490. [PMID: 39005801 PMCID: PMC11238240 DOI: 10.1021/acsomega.4c02017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 07/16/2024]
Abstract
The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting the biological activities of compounds using their molecular descriptors. Besides accurate activity estimation, obtaining a prediction uncertainty metric like a prediction interval is highly desirable. Quantifying prediction uncertainty is an active research area in statistical and machine learning (ML), but the implementation for QSAR remains challenging. However, most ML algorithms with high predictive performance require add-on companions for estimating the uncertainty of their prediction. Conformal prediction (CP) is a promising approach as its main components are agnostic to the prediction modes, and it produces valid prediction intervals under weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most widely used ML models, including random forests, deep neural networks, and gradient boosting. The algorithms use a novel approach to the derivation of nonconformity scores from the estimates of prediction uncertainty generated by the ensembles of point predictions. The validity and efficiency of proposed algorithms are demonstrated on a diverse collection of QSAR data sets as well as simulation studies. The provided software implementing our algorithms can be used as stand-alone or easily incorporated into other ML software packages for QSAR modeling.
Collapse
Affiliation(s)
- Yuting Xu
- Early
Development Statistics, Merck & Co.,
Inc., Rahway, New Jersey 07065, United States
| | - Andy Liaw
- Early
Development Statistics, Merck & Co.,
Inc., Rahway, New Jersey 07065, United States
| | - Robert P. Sheridan
- Modeling
and Informatics, Merck & Co., Inc., Rahway, New Jersey 07033, United States
| | - Vladimir Svetnik
- Early
Development Statistics, Merck & Co.,
Inc., Rahway, New Jersey 07065, United States
| |
Collapse
|
8
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
9
|
Arvidsson McShane S, Norinder U, Alvarsson J, Ahlberg E, Carlsson L, Spjuth O. CPSign: conformal prediction for cheminformatics modeling. J Cheminform 2024; 16:75. [PMID: 38943219 PMCID: PMC11214261 DOI: 10.1186/s13321-024-00870-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 06/11/2024] [Indexed: 07/01/2024] Open
Abstract
Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.
Collapse
Affiliation(s)
- Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, 10587, Sweden
- MTM Research Centre, School of Science and Technology, Örebro University, Örebro, 70182, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
| | - Ernst Ahlberg
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
- Department of Computer Science, Royal Holloway University of London, Egham, TW20 0EX, UK
| | - Lars Carlsson
- Department of Computer Science, Royal Holloway University of London, Egham, TW20 0EX, UK
- Department of Computing, Jönköping University, Jönköping, 55111, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden.
| |
Collapse
|
10
|
Lenhof K, Eckhart L, Rolli LM, Volkamer A, Lenhof HP. Reliable anti-cancer drug sensitivity prediction and prioritization. Sci Rep 2024; 14:12303. [PMID: 38811639 PMCID: PMC11137046 DOI: 10.1038/s41598-024-62956-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 05/23/2024] [Indexed: 05/31/2024] Open
Abstract
The application of machine learning (ML) to solve real-world problems does not only bear great potential but also high risk. One fundamental challenge in risk mitigation is to ensure the reliability of the ML predictions, i.e., the model error should be minimized, and the prediction uncertainty should be estimated. Especially for medical applications, the importance of reliable predictions can not be understated. Here, we address this challenge for anti-cancer drug sensitivity prediction and prioritization. To this end, we present a novel drug sensitivity prediction and prioritization approach guaranteeing user-specified certainty levels. The developed conformal prediction approach is applicable to classification, regression, and simultaneous regression and classification. Additionally, we propose a novel drug sensitivity measure that is based on clinically relevant drug concentrations and enables a straightforward prioritization of drugs for a given cancer sample.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany.
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Andrea Volkamer
- Center for Bioinformatics, Chair for Data Driven Drug Design, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| |
Collapse
|
11
|
Lambert B, Forbes F, Doyle S, Dehaene H, Dojat M. Trustworthy clinical AI solutions: A unified review of uncertainty quantification in Deep Learning models for medical image analysis. Artif Intell Med 2024; 150:102830. [PMID: 38553168 DOI: 10.1016/j.artmed.2024.102830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/02/2024]
Abstract
The full acceptance of Deep Learning (DL) models in the clinical field is rather low with respect to the quantity of high-performing solutions reported in the literature. End users are particularly reluctant to rely on the opaque predictions of DL models. Uncertainty quantification methods have been proposed in the literature as a potential solution, to reduce the black-box effect of DL models and increase the interpretability and the acceptability of the result by the final user. In this review, we propose an overview of the existing methods to quantify uncertainty associated with DL predictions. We focus on applications to medical image analysis, which present specific challenges due to the high dimensionality of images and their variable quality, as well as constraints associated with real-world clinical routine. Moreover, we discuss the concept of structural uncertainty, a corpus of methods to facilitate the alignment of segmentation uncertainty estimates with clinical attention. We then discuss the evaluation protocols to validate the relevance of uncertainty estimates. Finally, we highlight the open challenges for uncertainty quantification in the medical field.
Collapse
Affiliation(s)
- Benjamin Lambert
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut des Neurosciences, Grenoble, 38000, France; Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Florence Forbes
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, 38000, France
| | - Senan Doyle
- Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Harmonie Dehaene
- Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Michel Dojat
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut des Neurosciences, Grenoble, 38000, France.
| |
Collapse
|
12
|
Yang M, Chen H, Hu W, Mischi M, Shan C, Li J, Long X, Liu C. Development and Validation of an Interpretable Conformal Predictor to Predict Sepsis Mortality Risk: Retrospective Cohort Study. J Med Internet Res 2024; 26:e50369. [PMID: 38498038 PMCID: PMC10985608 DOI: 10.2196/50369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/16/2023] [Accepted: 01/24/2024] [Indexed: 03/19/2024] Open
Abstract
BACKGROUND Early and reliable identification of patients with sepsis who are at high risk of mortality is important to improve clinical outcomes. However, 3 major barriers to artificial intelligence (AI) models, including the lack of interpretability, the difficulty in generalizability, and the risk of automation bias, hinder the widespread adoption of AI models for use in clinical practice. OBJECTIVE This study aimed to develop and validate (internally and externally) a conformal predictor of sepsis mortality risk in patients who are critically ill, leveraging AI-assisted prediction modeling. The proposed approach enables explaining the model output and assessing its confidence level. METHODS We retrospectively extracted data on adult patients with sepsis from a database collected in a teaching hospital at Beth Israel Deaconess Medical Center for model training and internal validation. A large multicenter critical care database from the Philips eICU Research Institute was used for external validation. A total of 103 clinical features were extracted from the first day after admission. We developed an AI model using gradient-boosting machines to predict the mortality risk of sepsis and used Mondrian conformal prediction to estimate the prediction uncertainty. The Shapley additive explanation method was used to explain the model. RESULTS A total of 16,746 (80%) patients from Beth Israel Deaconess Medical Center were used to train the model. When tested on the internal validation population of 4187 (20%) patients, the model achieved an area under the receiver operating characteristic curve of 0.858 (95% CI 0.845-0.871), which was reduced to 0.800 (95% CI 0.789-0.811) when externally validated on 10,362 patients from the Philips eICU database. At a specified confidence level of 90% for the internal validation cohort the percentage of error predictions (n=438) out of all predictions (n=4187) was 10.5%, with 1229 (29.4%) predictions requiring clinician review. In contrast, the AI model without conformal prediction made 1449 (34.6%) errors. When externally validated, more predictions (n=4004, 38.6%) were flagged for clinician review due to interdatabase heterogeneity. Nevertheless, the model still produced significantly lower error rates compared to the point predictions by AI (n=1221, 11.8% vs n=4540, 43.8%). The most important predictors identified in this predictive model were Acute Physiology Score III, age, urine output, vasopressors, and pulmonary infection. Clinically relevant risk factors contributing to a single patient were also examined to show how the risk arose. CONCLUSIONS By combining model explanation and conformal prediction, AI-based systems can be better translated into medical practice for clinical decision-making.
Collapse
Affiliation(s)
- Meicheng Yang
- State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
| | - Hui Chen
- Department of Critical Care Medicine, Jiangsu Provincial Key Laboratory of Critical Care Medicine, Zhongda Hospital, Southeast University, Nanjing, China
| | - Wenhan Hu
- Department of Critical Care Medicine, Jiangsu Provincial Key Laboratory of Critical Care Medicine, Zhongda Hospital, Southeast University, Nanjing, China
| | - Massimo Mischi
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Caifeng Shan
- College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China
- School of Intelligence Science and Technology, Nanjing University, Nanjing, China
| | - Jianqing Li
- State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Xi Long
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Chengyu Liu
- State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
| |
Collapse
|
13
|
Kaneko H. Evaluation and Optimization Methods for Applicability Domain Methods and Their Hyperparameters, Considering the Prediction Performance of Machine Learning Models. ACS OMEGA 2024; 9:11453-11458. [PMID: 38496944 PMCID: PMC10938389 DOI: 10.1021/acsomega.3c08036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/19/2024] [Accepted: 02/12/2024] [Indexed: 03/19/2024]
Abstract
In molecular, material, and process design and control, the applicability domain (AD) of a mathematical model y = f(x) between properties, activities, and features x is constructed. As there are multiple AD methods, each with its own set of hyperparameters, it is necessary to select an appropriate AD method and hyperparameters for each data set and mathematical model. However, there is no method for optimizing the AD model. This study proposes a method for evaluating and optimizing the AD model for each data set and a mathematical model. Using the predictions of double cross-validation with all samples, the relationship between coverage and root-mean-squared error (RMSE) was calculated for all combinations of AD methods and their hyperparameters, and the area under the coverage and RMSE curve (AUCR) was calculated. The AD model with the lowest AUCR value was selected as the optimal fit for the mathematical model. The proposed method was validated using eight data sets, including molecules, materials, and spectra, demonstrating that the proposed method could generate optimal AD models for all data sets. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry,
School of Science and Technology, Meiji
University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
14
|
Sun ED, Ma R, Navarro Negredo P, Brunet A, Zou J. TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nat Methods 2024; 21:444-454. [PMID: 38347138 DOI: 10.1038/s41592-024-02184-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/12/2024] [Indexed: 02/27/2024]
Abstract
Whole-transcriptome spatial profiling of genes at single-cell resolution remains a challenge. To address this limitation, spatial gene expression prediction methods have been developed to infer the spatial expression of unmeasured transcripts, but the quality of these predictions can vary greatly. Here we present Transcript Imputation with Spatial Single-cell Uncertainty Estimation (TISSUE) as a general framework for estimating uncertainty for spatial gene expression predictions and providing uncertainty-aware methods for downstream inference. Leveraging conformal inference, TISSUE provides well-calibrated prediction intervals for predicted expression values across 11 benchmark datasets. Moreover, it consistently reduces the false discovery rate for differential gene expression analysis, improves clustering and visualization of predicted spatial transcriptomics and improves the performance of supervised learning models trained on predicted gene expression profiles. Applying TISSUE to a MERFISH spatial transcriptomics dataset of the adult mouse subventricular zone, we identified subtypes within the neural stem cell lineage and developed subtype-specific regional classifiers.
Collapse
Affiliation(s)
- Eric D Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Rong Ma
- Department of Statistics, Stanford University, Stanford, CA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Anne Brunet
- Department of Genetics, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
- Glenn Center for the Biology of Aging, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
15
|
Kopańska K, Rodríguez-Belenguer P, Llopis-Lorente J, Trenor B, Saiz J, Pastor M. Uncertainty assessment of proarrhythmia predictions derived from multi-level in silico models. Arch Toxicol 2023; 97:2721-2740. [PMID: 37528229 PMCID: PMC10474996 DOI: 10.1007/s00204-023-03557-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 07/12/2023] [Indexed: 08/03/2023]
Abstract
In silico methods can be used for an early assessment of arrhythmogenic properties of drug candidates. However, their use for decision-making is conditioned by the possibility to estimate the predictions' uncertainty. This work describes our efforts to develop uncertainty quantification methods for the predictions produced by multi-level proarrhythmia models. In silico models used in this field usually start with experimental or predicted IC50 values that describe drug-induced ion channel blockade. Using such inputs, an electrophysiological model computes how the ion channel inhibition, exerted by a drug in a certain concentration, translates to an altered shape and duration of the action potential in cardiac cells, which can be represented as arrhythmogenic risk biomarkers such as the APD90. Using this framework, we identify the main sources of aleatory and epistemic uncertainties and propose a method based on probabilistic simulations that replaces single-point estimates predicted using multiple input values, including the IC50s and the electrophysiological parameters, by distributions of values. Two selected variability types associated with these inputs are then propagated through the multi-level model to estimate their impact on the uncertainty levels in the output, expressed by means of intervals. The proposed approach yields single predictions of arrhythmogenic risk biomarkers together with value intervals, providing a more comprehensive and realistic description of drug effects on a human population. The methodology was tested by predicting arrhythmogenic biomarkers on a series of twelve well-characterised marketed drugs, belonging to different arrhythmogenic risk classes.
Collapse
Affiliation(s)
- Karolina Kopańska
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Research Institute, Barcelona, Spain
| | - Pablo Rodríguez-Belenguer
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Research Institute, Barcelona, Spain
- Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, Valencia, Spain
| | - Jordi Llopis-Lorente
- Centro de Investigación e Innovación en Bioingeniería (Ci2B), Universitat Politècnica de València, Valencia, Spain
| | - Beatriz Trenor
- Centro de Investigación e Innovación en Bioingeniería (Ci2B), Universitat Politècnica de València, Valencia, Spain
| | - Javier Saiz
- Centro de Investigación e Innovación en Bioingeniería (Ci2B), Universitat Politècnica de València, Valencia, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Research Institute, Barcelona, Spain.
| |
Collapse
|
16
|
Oršolić D, Šmuc T. Dynamic applicability domain (dAD): compound-target binding affinity estimates with local conformal prediction. Bioinformatics 2023; 39:btad465. [PMID: 37594752 PMCID: PMC10457664 DOI: 10.1093/bioinformatics/btad465] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 04/26/2023] [Accepted: 08/17/2023] [Indexed: 08/19/2023] Open
Abstract
MOTIVATION Increasing efforts are being made in the field of machine learning to advance the learning of robust and accurate models from experimentally measured data and enable more efficient drug discovery processes. The prediction of binding affinity is one of the most frequent tasks of compound bioactivity modelling. Learned models for binding affinity prediction are assessed by their average performance on unseen samples, but point predictions are typically not provided with a rigorous confidence assessment. Approaches, such as the conformal predictor framework equip conventional models with a more rigorous assessment of confidence for individual point predictions. In this article, we extend the inductive conformal prediction framework for interaction data, in particular the compound-target binding affinity prediction task. The new framework is based on dynamically defined calibration sets that are specific for each testing pair and provides prediction assessment in the context of calibration pairs from its compound-target neighbourhood, enabling improved estimates based on the local properties of the prediction model. RESULTS The effectiveness of the approach is benchmarked on several publicly available datasets and tested in realistic use-case scenarios with increasing levels of difficulty on a complex compound-target binding affinity space. We demonstrate that in such scenarios, novel approach combining applicability domain paradigm with conformal prediction framework, produces superior confidence assessment with valid and more informative prediction regions compared to other 'state-of-the-art' conformal prediction approaches. AVAILABILITY AND IMPLEMENTATION Dataset and the code are available on GitHub (https://github.com/mlkr-rbi/dAD).
Collapse
Affiliation(s)
- Davor Oršolić
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| |
Collapse
|
17
|
Herman S, Arvidsson McShane S, Zjukovskaja C, Khoonsari PE, Svenningsson A, Burman J, Spjuth O, Kultima K. Disease phenotype prediction in multiple sclerosis. iScience 2023; 26:106906. [PMID: 37332601 PMCID: PMC10275960 DOI: 10.1016/j.isci.2023.106906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/09/2023] [Accepted: 05/12/2023] [Indexed: 06/20/2023] Open
Abstract
Progressive multiple sclerosis (PMS) is currently diagnosed retrospectively. Here, we work toward a set of biomarkers that could assist in early diagnosis of PMS. A selection of cerebrospinal fluid metabolites (n = 15) was shown to differentiate between PMS and its preceding phenotype in an independent cohort (AUC = 0.93). Complementing the classifier with conformal prediction showed that highly confident predictions could be made, and that three out of eight patients developing PMS within three years of sample collection were predicted as PMS at that time point. Finally, this methodology was applied to PMS patients as part of a clinical trial for intrathecal treatment with rituximab. The methodology showed that 68% of the patients decreased their similarity to the PMS phenotype one year after treatment. In conclusion, the inclusion of confidence predictors contributes with more information compared to traditional machine learning, and this information is relevant for disease monitoring.
Collapse
Affiliation(s)
- Stephanie Herman
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | | | | | - Payam Emami Khoonsari
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Anders Svenningsson
- Department of Clinical Sciences, Danderyd Hospital, Karolinska Institutet, Stockholm, Sweden
| | - Joachim Burman
- Department of Neuroscience, Neurology, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden
| |
Collapse
|
18
|
March-Vila E, Ferretti G, Terricabras E, Ardao I, Brea JM, Varela MJ, Arana Á, Rubiolo JA, Sanz F, Loza MI, Sánchez L, Alonso H, Pastor M. A continuous in silico learning strategy to identify safety liabilities in compounds used in the leather and textile industry. Arch Toxicol 2023; 97:1091-1111. [PMID: 36781432 PMCID: PMC10025185 DOI: 10.1007/s00204-023-03459-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 02/02/2023] [Indexed: 02/15/2023]
Abstract
There is a widely recognized need to reduce human activity's impact on the environment. Many industries of the leather and textile sector (LTI), being aware of producing a significant amount of residues (Keßler et al. 2021; Liu et al. 2021), are adopting measures to reduce the impact of their processes on the environment, starting with a more comprehensive characterization of the chemical risk associated with the substances commonly used in LTI. The present work contributes to these efforts by compiling and toxicologically annotating the substances used in LTI, supporting a continuous learning strategy for characterizing their chemical safety. This strategy combines data collection from public sources, experimental methods and in silico predictions for characterizing four different endpoints: CMR, ED, PBT, and vPvB. We present the results of a prospective validation exercise in which we confirm that in silico methods can produce reasonably good hazard estimations and fill knowledge gaps in the LTI chemical space. The proposed protocol can speed the process and optimize the use of resources including the lives of experimental animals, contributing to identifying potentially harmful substances and their possible replacement by safer alternatives, thus reducing the environmental footprint and impact on human health.
Collapse
Affiliation(s)
- Eric March-Vila
- Department of Medicine and Life Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - Giacomo Ferretti
- Department of Medicine and Life Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - Emma Terricabras
- Department of Medicine and Life Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - Inés Ardao
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, Innopharma Drug Screening and Pharmacogenomics Platform. BioFarma Research Group. Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), University of Santiago de Compostela, Santiago de Compostela, Spain
| | - José Manuel Brea
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, Innopharma Drug Screening and Pharmacogenomics Platform. BioFarma Research Group. Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), University of Santiago de Compostela, Santiago de Compostela, Spain
| | - María José Varela
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, Innopharma Drug Screening and Pharmacogenomics Platform. BioFarma Research Group. Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Álvaro Arana
- Department of Zoology, Genetics and Physical Anthropology, Universidad de Santiago de Compostela, Campus de Lugo, 27002, Lugo, Spain
| | - Juan Andrés Rubiolo
- Department of Zoology, Genetics and Physical Anthropology, Universidad de Santiago de Compostela, Campus de Lugo, 27002, Lugo, Spain
| | - Ferran Sanz
- Department of Medicine and Life Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - María Isabel Loza
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, Innopharma Drug Screening and Pharmacogenomics Platform. BioFarma Research Group. Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Laura Sánchez
- Department of Zoology, Genetics and Physical Anthropology, Universidad de Santiago de Compostela, Campus de Lugo, 27002, Lugo, Spain
- Preclinical Animal Models Group, Health Research Institute of Santiago de Compostela (IDIS), 15782, Santiago de Compostela, Spain
| | - Héctor Alonso
- Department of Sustainability, INDITEX, Av. da Deputación, 15412, Arteixo, Spain
| | - Manuel Pastor
- Department of Medicine and Life Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
19
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|
20
|
Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In Silico Prediction of Human Clinical Pharmacokinetics with ANDROMEDA by Prosilico: Predictions for an Established Benchmarking Data Set, a Modern Small Drug Data Set, and a Comparison with Laboratory Methods. Altern Lab Anim 2023; 51:39-54. [PMID: 36572567 DOI: 10.1177/02611929221148447] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
There is an ongoing aim to replace animal and in vitro laboratory models with in silico methods. Such replacement requires the successful validation and comparably good performance of the alternative methods. We have developed an in silico prediction system for human clinical pharmacokinetics, based on machine learning, conformal prediction and a new physiologically-based pharmacokinetic model, i.e. ANDROMEDA. The objectives of this study were: a) to evaluate how well ANDROMEDA predicts the human clinical pharmacokinetics of a previously proposed benchmarking data set comprising 24 physicochemically diverse drugs and 28 small drug molecules new to the market in 2021; b) to compare its predictive performance with that of laboratory methods; and c) to investigate and describe the pharmacokinetic characteristics of the modern drugs. Median and maximum prediction errors for the selected major parameters were ca 1.2 to 2.5-fold and 16-fold for both data sets, respectively. Prediction accuracy was on par with, or better than, the best laboratory-based prediction methods (superior performance for a vast majority of the comparisons), and the prediction range was considerably broader. The modern drugs have higher average molecular weight than those in the benchmarking set from 15 years earlier (ca 200 g/mol higher), and were predicted to (generally) have relatively complex pharmacokinetics, including permeability and dissolution limitations and significant renal, biliary and/or gut-wall elimination. In conclusion, the results were overall better than those obtained with laboratory methods, and thus serve to further validate the ANDROMEDA in silico system for the prediction of human clinical pharmacokinetics of modern and physicochemically diverse drugs.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Prosilico AB, Huddinge, Sweden.,Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
21
|
Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction. Nat Commun 2022; 13:7761. [PMID: 36522311 PMCID: PMC9755280 DOI: 10.1038/s41467-022-34945-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 11/08/2022] [Indexed: 12/16/2022] Open
Abstract
Unreliable predictions can occur when an artificial intelligence (AI) system is presented with data it has not been exposed to during training. We demonstrate the use of conformal prediction to detect unreliable predictions, using histopathological diagnosis and grading of prostate biopsies as example. We digitized 7788 prostate biopsies from 1192 men in the STHLM3 diagnostic study, used for training, and 3059 biopsies from 676 men used for testing. With conformal prediction, 1 in 794 (0.1%) predictions is incorrect for cancer diagnosis (compared to 14 errors [2%] without conformal prediction) while 175 (22%) of the predictions are flagged as unreliable when the AI-system is presented with new data from the same lab and scanner that it was trained on. Conformal prediction could with small samples (N = 49 for external scanner, N = 10 for external lab and scanner, and N = 12 for external lab, scanner and pathology assessment) detect systematic differences in external data leading to worse predictive performance. The AI-system with conformal prediction commits 3 (2%) errors for cancer detection in cases of atypical prostate tissue compared to 44 (25%) without conformal prediction, while the system flags 143 (80%) unreliable predictions. We conclude that conformal prediction can increase patient safety of AI-systems.
Collapse
|
22
|
Sreenivasan AP, Harrison PJ, Schaal W, Matuszewski DJ, Kultima K, Spjuth O. Predicting protein network topology clusters from chemical structure using deep learning. J Cheminform 2022; 14:47. [PMID: 35841114 PMCID: PMC9284831 DOI: 10.1186/s13321-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/06/2022] [Indexed: 11/10/2022] Open
Abstract
Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.
Collapse
Affiliation(s)
- Akshai P Sreenivasan
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.,Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Philip J Harrison
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Damian J Matuszewski
- Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.
| |
Collapse
|
23
|
Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M, Volkamer A. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep 2022; 12:7244. [PMID: 35508546 PMCID: PMC9068909 DOI: 10.1038/s41598-022-09309-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/17/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Marina Garcia de Lomana
- BASF SE, 67056, Ludwigshafen, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden
- Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden
- MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
| |
Collapse
|
24
|
In silico predictions of the gastrointestinal uptake of macrocycles in man using conformal prediction methodology. J Pharm Sci 2022; 111:2614-2619. [DOI: 10.1016/j.xphs.2022.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 05/16/2022] [Accepted: 05/16/2022] [Indexed: 11/17/2022]
|
25
|
Tajmouati S, EL Wahbi B, Dakkon M. Applying regression conformal prediction with nearest neighbors to time series data. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2057538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Samya Tajmouati
- Department of Mathematics, Ibn Tofail University, Faculty of Sciences, Kénitra, Morocco
| | - Bouazza EL Wahbi
- Department of Mathematics, Ibn Tofail University, Faculty of Sciences, Kénitra, Morocco
| | - Mohamed Dakkon
- Department of Economics and Management, Abdelmalek Essaâdi University, FSJES, Tétouan, Morocco
| |
Collapse
|
26
|
Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In silico predictions of the human pharmacokinetics/toxicokinetics of 65 chemicals from various classes using conformal prediction methodology. Xenobiotica 2022; 52:113-118. [PMID: 35238270 DOI: 10.1080/00498254.2022.2049397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Pharmacokinetic/toxicokinetic (PK/TK) information for chemicals in humans is generally lacking. Here we applied machine learning, conformal prediction and a new physiologically-based PK/TK model for prediction of the human PK/TK of 65 chemicals from different classes, including carcinogens, food constituents and preservatives, vitamins, sweeteners, dyes and colours, pesticides, alternative medicines, flame retardants, psychoactive drugs, dioxins, poisons, UV-absorbents, surfactants, solvents and cosmetics.About 80% of the main human PK/TK (fraction absorbed, oral bioavailability, half-life, unbound fraction in plasma, clearance, volume of distribution, fraction excreted) for the selected chemicals was missing in the literature. This information was now added (from in silico predictions). Median and mean prediction errors for these parameters were 1.3- to 2.7-fold and 1.4- to 4.8-fold, respectively. In total, 59 and 86% of predictions had errors <2- and <5-fold, respectively. Predicted and observed PK/TK for the chemicals was generally within the range for pharmaceutical drugs.The results validated the new integrated system for prediction of the human PK/TK for different chemicals and added important missing information. No general difference in PK/TK-characteristics was found between the selected chemicals and pharmaceutical drugs.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala, SE-751 24 Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala, 75124 Sweden
| |
Collapse
|
27
|
Wang D, Wang P, Wang C, Wang P. Calibrating probabilistic predictions of quantile regression forests with conformal predictive systems. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
28
|
Klutzny S, Kornhuber M, Morger A, Schönfelder G, Volkamer A, Oelgeschläger M, Dunst S. Quantitative high-throughput phenotypic screening for environmental estrogens using the E-Morph Screening Assay in combination with in silico predictions. ENVIRONMENT INTERNATIONAL 2022; 158:106947. [PMID: 34717173 DOI: 10.1016/j.envint.2021.106947] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Exposure to environmental chemicals that interfere with normal estrogen function can lead to adverse health effects, including cancer. High-throughput screening (HTS) approaches facilitate the efficient identification and characterization of such substances. OBJECTIVES We recently described the development of the E-Morph Assay, which measures changes at adherens junctions as a clinically-relevant phenotypic readout for estrogen receptor (ER) alpha signaling activity. Here, we describe its further development and application for automated robotic HTS. METHODS Using the advanced E-Morph Screening Assay, we screened a substance library comprising 430 toxicologically-relevant industrial chemicals, biocides, and plant protection products to identify novel substances with estrogenic activities. Based on the primary screening data and the publicly available ToxCast dataset, we performed an insilico similarity search to identify further substances with potential estrogenic activity for follow-up hit expansion screening, and built seven insilico ER models using the conformal prediction (CP) framework to evaluate the HTS results. RESULTS The primary and hit confirmation screens identified 27 'known' estrogenic substances with potencies correlating very well with the published ToxCast ER Agonist Score (r=+0.95). We additionally detected potential 'novel' estrogenic activities for 10 primary hit substances and for another nine out of 20 structurally similar substances from insilico predictions and follow-up hit expansion screening. The concordance of the E-Morph Screening Assay with the ToxCast ER reference data and the generated CP ER models was 71% and 73%, respectively, with a high predictivity for ER active substances of up to 87%, which is particularly important for regulatory purposes. DISCUSSION These data provide a proof-of-concept for the combination of in vitro HTS approaches with insilico methods (similarity search, CP models) for efficient analysis of large substance libraries in order to prioritize substances with potential estrogenic activity for subsequent testing against higher tier human endpoints.
Collapse
Affiliation(s)
- Saskia Klutzny
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Marja Kornhuber
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany; Freie Universität Berlin, Berlin, Germany
| | - Andrea Morger
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Gilbert Schönfelder
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany; Institute of Clinical Pharmacology and Toxicology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Andrea Volkamer
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Michael Oelgeschläger
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Sebastian Dunst
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany.
| |
Collapse
|
29
|
Miljković F, Rodríguez-Pérez R, Bajorath J. Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis. ACS OMEGA 2021; 6:33293-33299. [PMID: 34926881 PMCID: PMC8674916 DOI: 10.1021/acsomega.1c05512] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 11/18/2021] [Indexed: 05/17/2023]
Abstract
As in other areas, artificial intelligence (AI) is heavily promoted in different scientific fields, including chemistry. Although chemistry traditionally tends to be a conservative field and slower than others to adapt new concepts, AI is increasingly being investigated across chemical disciplines. In medicinal chemistry, supported by computer-aided drug design and cheminformatics, computational methods have long been employed to aid in the search for and optimization of active compounds. We are currently witnessing a multitude of AI-related publications in the medicinal-chemistry-relevant literature and anticipate that the numbers will further increase. Often, advances through AI promoted in such reports are difficult to reconcile or remain questionable, which hampers the acceptance of computational work in interdisciplinary environments. Herein we attempt to highlight selected investigations in which AI has shown promise to impact medicinal chemistry in areas such as compound design and synthesis.
Collapse
Affiliation(s)
- Filip Miljković
- Department
of Life Science Informatics and Data Science, B-IT, LIMES Program
Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Data
Science and AI, Imaging and Data Analytics, Clinical Pharmacology
& Safety Sciences, R&D, AstraZeneca, SE-431 83 Gothenburg, Sweden
| | - Raquel Rodríguez-Pérez
- Department
of Life Science Informatics and Data Science, B-IT, LIMES Program
Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Novartis
Institutes for Biomedical Research, Novartis
Campus, CH-4002 Basel, Switzerland
| | - Jürgen Bajorath
- Department
of Life Science Informatics and Data Science, B-IT, LIMES Program
Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Phone: 49-228-7369-100.
| |
Collapse
|
30
|
Fagerholm U, Hellberg S, Alvarsson J, Arvidsson McShane S, Spjuth O. In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models. Xenobiotica 2021; 51:1366-1371. [PMID: 34845977 DOI: 10.1080/00498254.2021.2011471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Volume of distribution at steady state (Vss) is an important pharmacokinetic endpoint. In this study we apply machine learning and conformal prediction for human Vss prediction, and make a head-to-head comparison with rat-to-man scaling, allometric scaling and the Rodgers-Lukova method on combined in silico and in vitro data, using a test set of 105 compounds with experimentally observed Vss.The mean prediction error and % with <2-fold prediction error for our method were 2.4-fold and 64%, respectively. 69% of test compounds had an observed Vss within the prediction interval at a 70% confidence level. In comparison, 2.2-, 2.9- and 3.1-fold mean errors and 69, 64 and 61% of predictions with <2-fold error was reached with rat-to-man and allometric scaling and Rodgers-Lukova method, respectively.We conclude that our method has theoretically proven validity that was empirically confirmed, and showing predictive accuracy on par with animal models and superior to an alternative widely used in silico-based method. The option for the user to select the level of confidence in predictions offers better guidance on how to optimise Vss in drug discovery applications.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
31
|
Gandouz M, Holzmann H, Heider D. Machine learning with asymmetric abstention for biomedical decision-making. BMC Med Inform Decis Mak 2021; 21:294. [PMID: 34702225 PMCID: PMC8549182 DOI: 10.1186/s12911-021-01655-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 10/13/2021] [Indexed: 02/08/2023] Open
Abstract
Machine learning and artificial intelligence have entered biomedical decision-making for diagnostics, prognostics, or therapy recommendations. However, these methods need to be interpreted with care because of the severe consequences for patients. In contrast to human decision-making, computational models typically make a decision also with low confidence. Machine learning with abstention better reflects human decision-making by introducing a reject option for samples with low confidence. The abstention intervals are typically symmetric intervals around the decision boundary. In the current study, we use asymmetric abstention intervals, which we demonstrate to be better suited for biomedical data that is typically highly imbalanced. We evaluate symmetric and asymmetric abstention on three real-world biomedical datasets and show that both approaches can significantly improve classification performance. However, asymmetric abstention rejects as many or fewer samples compared to symmetric abstention and thus, should be used in imbalanced data.
Collapse
Affiliation(s)
- Mariem Gandouz
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, 35032, Marburg, Germany
| | - Hajo Holzmann
- Department of Statistics, Faculty of Mathematics and Computer Science, University of Marburg, 35032, Marburg, Germany
| | - Dominik Heider
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, 35032, Marburg, Germany.
| |
Collapse
|
32
|
Norinder U, Spjuth O, Svensson F. Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning. J Cheminform 2021; 13:77. [PMID: 34600569 PMCID: PMC8487527 DOI: 10.1186/s13321-021-00555-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 09/15/2021] [Indexed: 12/05/2022] Open
Abstract
Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox.
Collapse
Affiliation(s)
- Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.,Department of Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, University College London, The Cruciform Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
33
|
Arvidsson McShane S, Ahlberg E, Noeske T, Spjuth O. Machine Learning Strategies When Transitioning between Biological Assays. J Chem Inf Model 2021; 61:3722-3733. [PMID: 34152755 PMCID: PMC8317157 DOI: 10.1021/acs.jcim.1c00293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning is widely used in drug development to predict activity in biological assays based on chemical structure. However, the process of transitioning from one experimental setup to another for the same biological endpoint has not been extensively studied. In a retrospective study, we here explore different modeling strategies of how to combine data from the old and new assays when training conformal prediction models using data from hERG and NaV assays. We suggest to continuously monitor the validity and efficiency of models as more data is accumulated from the new assay and select a modeling strategy based on these metrics. In order to maximize the utility of data from the old assay, we propose a strategy that augments the proper training set of an inductive conformal predictor by adding data from the old assay but only having data from the new assay in the calibration set, which results in valid (well-calibrated) models with improved efficiency compared to other strategies. We study the results for varying sizes of new and old assays, allowing for discussion of different practical scenarios. We also conclude that our proposed assay transition strategy is more beneficial, and the value of data from the new assay is higher, for the harder case of regression compared to classification problems.
Collapse
Affiliation(s)
- Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden
| | - Ernst Ahlberg
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden.,Stena Line Scandinavia AB, AI & Data, 405 19 Gothenburg, Sweden.,Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, 431 50 Gothenburg, Sweden
| | - Tobias Noeske
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, 431 50 Gothenburg, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden
| |
Collapse
|
34
|
Nigam A, Pollice R, Hurley MFD, Hickman RJ, Aldeghi M, Yoshikawa N, Chithrananda S, Voelz VA, Aspuru-Guzik A. Assigning confidence to molecular property prediction. Expert Opin Drug Discov 2021; 16:1009-1023. [DOI: 10.1080/17460441.2021.1925247] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- AkshatKumar Nigam
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | | | - Riley J. Hickman
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Matteo Aldeghi
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, University Ave Suite 710, Toronto, Canada
| | - Naruki Yoshikawa
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | | | | | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, University Ave Suite 710, Toronto, Canada
- Canadian Institute for Advanced Research (CIFAR), University Ave, Toronto, Canada
| |
Collapse
|
35
|
Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021; 61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure-activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
Collapse
Affiliation(s)
- Carmen Esposito
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A Landrum
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland.,T5 Informatics GmbH, Spalenring 11, 4055 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
36
|
Morger A, Svensson F, Arvidsson McShane S, Gauraha N, Norinder U, Spjuth O, Volkamer A. Assessing the calibration in toxicological in vitro models with conformal prediction. J Cheminform 2021; 13:35. [PMID: 33926567 PMCID: PMC8082859 DOI: 10.1186/s13321-021-00511-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/10/2021] [Indexed: 11/11/2022] Open
Abstract
Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Niharika Gauraha
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Division of Computational Science and Technology, KTH, 100 44, Stockholm, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Dept. Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70 182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.
| |
Collapse
|
37
|
Karmacharya U, Chaudhary P, Lim D, Dahal S, Awasthi BP, Park HD, Kim JA, Jeong BS. Synthesis and anticancer evaluation of 6-azacyclonol-2,4,6-trimethylpyridin-3-ol derivatives: M3 muscarinic acetylcholine receptor-mediated anticancer activity of a cyclohexyl derivative in androgen-refractory prostate cancer. Bioorg Chem 2021; 110:104805. [PMID: 33725508 DOI: 10.1016/j.bioorg.2021.104805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/20/2021] [Accepted: 03/02/2021] [Indexed: 12/24/2022]
Abstract
We recently reported 2,4,5-trimethylpyridin-3-ol with C(6)-azacyclonol, whose code name is BJ-1207, showing a promising anticancer activity by inhibiting NOX-derived ROS in A549 human lung cancer cells. The present study was focused on structural modification of the azacyclonol moiety of BJ-1207 to find a compound with better anticancer activity. Ten new compounds (3A-3J) were prepared and evaluated their inhibitory actions against proliferation of eighteen cancer cell lines as a primary screening. Among the ten derivatives of BJ-1207, the effects of compounds 3A and 3J on DU145 and PC-3, androgen-refractory cancer cell lines (ARPC), were greater than the parent compound, and compound 3A showed better activity than 3J. Antitumor activity of compound 3A was also observed in DU145-xenografted chorioallantoic membrane (CAM) tumor model. In addition, the ligand-based target prediction and molecular docking study using DeepZema® server showed compound 3A was a ligand to M3 muscarinic acetylcholine receptor (M3R) which is overexpressed in ARPC. Carbachol, a muscarinic receptor agonist, concentration dependently increased proliferation of DU145 in the absence of serum, and it also activated NADPH oxidase (NOX). The carbachol-induced proliferation and NOX activity was significantly blocked by compounds 3A in a concentration-dependent manner. This finding might become a new milestone in the development of pyridinol-based anti-cancer agents against ARPC.
Collapse
Affiliation(s)
- Ujjwala Karmacharya
- College of Pharmacy, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea
| | - Prakash Chaudhary
- College of Pharmacy, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea
| | - Dongchul Lim
- Innovo Therapeutics Inc., Daeduck Biz Center C-313, 17 Techno 4-ro, Yuseong-gu, Daejeon 34013, Republic of Korea
| | - Sadan Dahal
- College of Pharmacy, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea
| | - Bhuwan Prasad Awasthi
- College of Pharmacy, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea
| | - Hee Dong Park
- Innovo Therapeutics Inc., Daeduck Biz Center C-313, 17 Techno 4-ro, Yuseong-gu, Daejeon 34013, Republic of Korea
| | - Jung-Ae Kim
- College of Pharmacy, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea.
| | - Byeong-Seon Jeong
- College of Pharmacy, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea.
| |
Collapse
|