1
|
Smajić A, Rami I, Sosnin S, Ecker GF. Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets. Chem Res Toxicol 2023; 36:1300-1312. [PMID: 37439496 PMCID: PMC10445286 DOI: 10.1021/acs.chemrestox.3c00042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Indexed: 07/14/2023]
Abstract
Each year, publicly available databases are updated with new compounds from different research institutions. Positive experimental outcomes are more likely to be reported; therefore, they account for a considerable fraction of these entries. Established publicly available databases such as ChEMBL allow researchers to use information without constrictions and create predictive tools for a broad spectrum of applications in the field of toxicology. Therefore, we investigated the distribution of positive and nonpositive entries within ChEMBL for a set of off-targets and its impact on the performance of classification models when applied to pharmaceutical industry data sets. Results indicate that models trained on publicly available data tend to overpredict positives, and models based on industry data sets predict negatives more often than those built using publicly available data sets. This is strengthened even further by the visualization of the prediction space for a set of 10,000 compounds, which makes it possible to identify regions in the chemical space where predictions converge. Finally, we highlight the utilization of these models for consensus modeling for potential adverse events prediction.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Iris Rami
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| |
Collapse
|
2
|
Using chemical and biological data to predict drug toxicity. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:53-64. [PMID: 36639032 DOI: 10.1016/j.slasd.2022.12.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 12/19/2022] [Accepted: 12/31/2022] [Indexed: 01/12/2023]
Abstract
Various sources of information can be used to better understand and predict compound activity and safety-related endpoints, including biological data such as gene expression and cell morphology. In this review, we first introduce types of chemical, in vitro and in vivo information that can be used to describe compounds and adverse effects. We then explore how compound descriptors based on chemical structure or biological perturbation response can be used to predict safety-related endpoints, and how especially biological data can help us to better understand adverse effects mechanistically. Overall, the described applications demonstrate how large-scale biological information presents new opportunities to anticipate and understand the biological effects of compounds, and how this can support predictive toxicology and drug discovery projects.
Collapse
|
3
|
Smajić A, Grandits M, Ecker GF. Using Jupyter Notebooks for re-training machine learning models. J Cheminform 2022; 14:54. [PMID: 35964049 PMCID: PMC9375336 DOI: 10.1186/s13321-022-00635-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 07/31/2022] [Indexed: 11/10/2022] Open
Abstract
Machine learning (ML) models require an extensive, user-driven selection of molecular descriptors in order to learn from chemical structures to predict actives and inactives with a high reliability. In addition, privacy concerns often restrict the access to sufficient data, leading to models with a narrow chemical space. Therefore, we propose a framework of re-trainable models that can be transferred from one local instance to another, and further allow a less extensive descriptor selection. The models are shared via a Jupyter Notebook, allowing the evaluation and implementation of a broader chemical space by keeping most of the tunable parameters pre-defined. This enables the models to be updated in a decentralized, facile, and fast manner. Herein, the method was evaluated with six transporter datasets (BCRP, BSEP, OATP1B1, OATP1B3, MRP3, P-gp), which revealed the general applicability of this approach.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Melanie Grandits
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria.
| | - Gerhard F Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| |
Collapse
|
4
|
Sarntivijai S, Blomberg N, Lauer KB, Briggs K, Steger-Hartmann T, van der Lei J, Sauer JM, Liwski R, Mourby M, Camprubi M. eTRANSAFE: Building a sustainable framework to share reproducible drug safety knowledge with the public domain. F1000Res 2022; 11. [PMID: 35602243 PMCID: PMC9096149 DOI: 10.12688/f1000research.74024.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
Integrative drug safety research in translational health informatics has rapidly evolved and included data that are drawn in from many resources, combining diverse data that are either reused from (curated) repositories, or newly generated at source. Each resource is mandated by different sets of metadata rules that are imposed on the incoming data. Combination of the data cannot be readily achieved without interference of data stewardship and the top-down policy guidelines that supervise and inform the process for data combination to aid meaningful interpretation and analysis of such data. The eTRANSAFE Consortium's effort to drive integrative drug safety research at a large scale hereby present the lessons learnt and the proposal of solution at the guidelines in practice at this Innovative Medicines Initiative (IMI) project. Recommendations in these guidelines were compiled from feedback received from key stakeholders in regulatory agencies, EFPIA companies, and academic partners. The research reproducibility guidelines presented in this study lay the foundation for a comprehensive data sharing and knowledge management plans accounting for research data management in the drug safety space - FAIR data sharing guidelines, and the model verification guidelines as generic deliverables that best practices that can be reused by other scientific community members at large. FAIR data sharing is a dynamic landscape that rapidly evolves with fast-paced technology advancements. The research reproducibility in drug safety guidelines introduced in this study provides a reusable framework that can be adopted by other research communities that aim to integrate public and private data in biomedical research space.
Collapse
Affiliation(s)
| | - Niklas Blomberg
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Katharine Briggs
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK
| | - Thomas Steger-Hartmann
- Bayer AG, Research & Development, Pharmaceuticals, Investigational Toxicology, 13342 Berlin, Germany
| | - Johan van der Lei
- Department of Medical Informatics, Erasmus University Rotterdam, EUR - Erasmus Medical Center (MC), Rotterdam, The Netherlands
| | - John-Michael Sauer
- Predictive Safety Testing Consortium, Critical Path Institute, Tucson, Arizona, 85718, USA
| | - Richard Liwski
- Predictive Safety Testing Consortium, Critical Path Institute, Tucson, Arizona, 85718, USA
| | - Miranda Mourby
- Centre for Health, Law and Emerging Technologies (HeLEX), Faculty of Law, University of Oxford, Oxford, OX2 7DD, UK
| | - Montse Camprubi
- Synapse Research Management Partners S.L., C. Diputació 237, Àtic 3a, 08007, Barcelona, Spain
| | | |
Collapse
|
5
|
Pastor M, Sanz F, Bringezu F. Development of In Silico Methods for Toxicity Prediction in Collaboration Between Academia and the Pharmaceutical Industry. Methods Mol Biol 2022; 2425:119-131. [PMID: 35188630 DOI: 10.1007/978-1-0716-1960-5_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The pharmaceutical industry would benefit from the collaboration with academic groups in the development of predictive safety models using the newest computational technologies. However, this collaboration is sometimes hampered by the handling of confidential proprietary information and different working practices in both environments. In this manuscript, we propose a strategy for facilitating this collaboration, based on the use of modeling frameworks developed for facilitating the use of sensitive data, as well as the development, interchange, hosting, and use of predictive models in production. The strategy is illustrated with a real example in which we used Flame, an open-source modeling framework developed in our group, for the development of an in silico eye irritation model. The model was based on bibliographic data, refined during the company-academic group collaboration, and enriched with the incorporation of confidential data, yielding a useful model that was validated experimentally.
Collapse
Affiliation(s)
- Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Frank Bringezu
- Chemical and Preclinical Safety, Merck Healthcare KGaA, Darmstadt, Germany
| |
Collapse
|
6
|
Morger A, Svensson F, Arvidsson McShane S, Gauraha N, Norinder U, Spjuth O, Volkamer A. Assessing the calibration in toxicological in vitro models with conformal prediction. J Cheminform 2021; 13:35. [PMID: 33926567 PMCID: PMC8082859 DOI: 10.1186/s13321-021-00511-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/10/2021] [Indexed: 11/11/2022] Open
Abstract
Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Niharika Gauraha
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Division of Computational Science and Technology, KTH, 100 44, Stockholm, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Dept. Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70 182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.
| |
Collapse
|
7
|
Pastor M, Gómez-Tamayo JC, Sanz F. Flame: an open source framework for model development, hosting, and usage in production environments. J Cheminform 2021; 13:31. [PMID: 33875019 PMCID: PMC8054391 DOI: 10.1186/s13321-021-00509-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 04/08/2021] [Indexed: 01/17/2023] Open
Abstract
This article describes Flame, an open source software for building predictive models and supporting their use in production environments. Flame is a web application with a web-based graphic interface, which can be used as a desktop application or installed in a server receiving requests from multiple users. Models can be built starting from any collection of biologically annotated chemical structures since the software supports structural normalization, molecular descriptor calculation, and machine learning model generation using predefined workflows. The model building workflow can be customized from the graphic interface, selecting the type of normalization, molecular descriptors, and machine learning algorithm to be used from a panel of state-of-the-art methods implemented natively. Moreover, Flame implements a mechanism allowing to extend its source code, adding unlimited model customization. Models generated with Flame can be easily exported, facilitating collaborative model development. All models are stored in a model repository supporting model versioning. Models are identified by unique model IDs and include detailed documentation formatted using widely accepted standards. The current version is the result of nearly 3 years of development in collaboration with users from the pharmaceutical industry within the IMI eTRANSAFE project, which aims, among other objectives, to develop high-quality predictive models based on shared legacy data for assessing the safety of drug candidates.
Collapse
Affiliation(s)
- Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain.
| | - José Carlos Gómez-Tamayo
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
8
|
Thompson DC, Bentzien J. Crowdsourcing and open innovation in drug discovery: recent contributions and future directions. Drug Discov Today 2020; 25:2284-2293. [PMID: 33011343 PMCID: PMC7529695 DOI: 10.1016/j.drudis.2020.09.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 08/27/2020] [Accepted: 09/17/2020] [Indexed: 01/03/2023]
Abstract
The past decade has seen significant growth in the use of 'crowdsourcing' and open innovation approaches to engage 'citizen scientists' to perform novel scientific research. Here, we quantify and summarize the current state of adoption of open innovation by major pharmaceutical companies. We also highlight recent crowdsourcing and open innovation research contributions to the field of drug discovery, and interesting future directions.
Collapse
Affiliation(s)
| | - Jörg Bentzien
- Alkermes, Inc. 852 Winter Street, Waltham, MA 02451-1420, USA
| |
Collapse
|
9
|
Cronin MT, Madden JC, Yang C, Worth AP. Unlocking the potential of in silico chemical safety assessment - A report on a cross-sector symposium on current opportunities and future challenges. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2019; 10:38-43. [PMID: 31218266 PMCID: PMC6559213 DOI: 10.1016/j.comtox.2018.12.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 12/17/2018] [Indexed: 12/21/2022]
Abstract
In silico chemical safety assessment can support the evaluation of hazard and risk following potential exposure to a substance. A symposium identified a number of opportunities and challenges to implement in silico methods, such as quantitative structure-activity relationships (QSARs) and read-across, to assess the potential harm of a substance in a variety of exposure scenarios, e.g. pharmaceuticals, personal care products, and industrial chemicals. To initiate the process of in silico safety assessment, clear and unambiguous problem formulation is required to provide the context for these methods. These approaches must be built on data of defined quality, while acknowledging the possibility of novel data resources tapping into on-going progress with data sharing. Models need to be developed that cover appropriate toxicity and kinetic endpoints, and that are documented appropriately with defined uncertainties. The application and implementation of in silico models in chemical safety requires a flexible technological framework that enables the integration of multiple strands of data and evidence. The findings of the symposium allowed for the identification of priorities to progress in silico chemical safety assessment towards the animal-free assessment of chemicals.
Collapse
Affiliation(s)
- Mark T.D. Cronin
- Liverpool John Moores University, School of Pharmacy and Biomolecular Sciences, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Judith C. Madden
- Liverpool John Moores University, School of Pharmacy and Biomolecular Sciences, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Chihae Yang
- Molecular Networks GmbH, Neumeyerstraße 28, 90411 Nürnberg, Germany
| | - Andrew P. Worth
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| |
Collapse
|