1
|
Wu L, Xu J, Tong W. PERform: assessing model performance with predictivity and explainability readiness formula. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, TOXICOLOGY AND CARCINOGENESIS 2024:1-16. [PMID: 38619534 DOI: 10.1080/26896583.2024.2340391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
In the rapidly evolving field of artificial intelligence (AI), explainability has been traditionally assessed in a post-modeling process and is often subjective. In contrary, many quantitative metrics have been routinely used to assess a model's performance. We proposed a unified formular named PERForm, by incorporating explainability as a weight into the existing statistical metrics to provide an integrated and quantitative measure of both predictivity and explainability to guide model selection, application, and evaluation. PERForm was designed as a generic formula and can be applied to any data types. We applied PERForm on a range of diverse datasets, including DILIst, Tox21, and three MAQC-II benchmark datasets, using various modeling algorithms to predict a total of 73 distinct endpoints. For example, AdaBoost algorithms exhibited superior performance (PERForm AUC for AdaBoost is 0.129 where Linear regression is 0) in DILIst prediction, where linear regression outperformed other models in the majority of Tox21 endpoints (PERForm AUC for linear regression is 0.301 where AdaBoost is 0.283 in average). This research marks a significant step toward comprehensively evaluating the utility of an AI model to advance transparency and interpretability, where the tradeoff between a model's performance and its interpretability can have profound implications.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR, USA
| |
Collapse
|
2
|
Di Stefano M, Galati S, Piazza L, Granchi C, Mancini S, Fratini F, Macchia M, Poli G, Tuccinardi T. VenomPred 2.0: A Novel In Silico Platform for an Extended and Human Interpretable Toxicological Profiling of Small Molecules. J Chem Inf Model 2024; 64:2275-2289. [PMID: 37676238 PMCID: PMC11005041 DOI: 10.1021/acs.jcim.3c00692] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Indexed: 09/08/2023]
Abstract
The application of artificial intelligence and machine learning (ML) methods is becoming increasingly popular in computational toxicology and drug design; it is considered as a promising solution for assessing the safety profile of compounds, particularly in lead optimization and ADMET studies, and to meet the principles of the 3Rs, which calls for the replacement, reduction, and refinement of animal testing. In this context, we herein present the development of VenomPred 2.0 (http://www.mmvsl.it/wp/venompred2/), the new and improved version of our free of charge web tool for toxicological predictions, which now represents a powerful web-based platform for multifaceted and human-interpretable in silico toxicity profiling of chemicals. VenomPred 2.0 presents an extended set of toxicity endpoints (androgenicity, skin irritation, eye irritation, and acute oral toxicity, in addition to the already available carcinogenicity, mutagenicity, hepatotoxicity, and estrogenicity) that can be evaluated through an exhaustive consensus prediction strategy based on multiple ML models. Moreover, we also implemented a new utility based on the Shapley Additive exPlanations (SHAP) method that allows human interpretable toxicological profiling of small molecules, highlighting the features that strongly contribute to the toxicological predictions in order to derive structural toxicophores.
Collapse
Affiliation(s)
- Miriana Di Stefano
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
- Department
of Life Sciences, University of Siena, 53100 Siena, Italy
| | - Salvatore Galati
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Lisa Piazza
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Carlotta Granchi
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Simone Mancini
- Department
of Veterinary Sciences, University of Pisa, Viale Delle Piagge 2, 56124 Pisa, Italy
| | - Filippo Fratini
- Department
of Veterinary Sciences, University of Pisa, Viale Delle Piagge 2, 56124 Pisa, Italy
| | - Marco Macchia
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Giulio Poli
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Tiziano Tuccinardi
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| |
Collapse
|
3
|
Zubrod JP, Galic N, Vaugeois M, Dreier DA. Bio-QSARs 2.0: Unlocking a new level of predictive power for machine learning-based ecotoxicity predictions by exploiting chemical and biological information. ENVIRONMENT INTERNATIONAL 2024; 186:108607. [PMID: 38593686 DOI: 10.1016/j.envint.2024.108607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/07/2024] [Accepted: 03/25/2024] [Indexed: 04/11/2024]
Abstract
Practical, legal, and ethical reasons necessitate the development of methods to replace animal experiments. Computational techniques to acquire information that traditionally relied on animal testing are considered a crucial pillar among these so-called new approach methodologies. In this light, we recently introduced the Bio-QSAR concept for multispecies aquatic toxicity regression tasks. These machine learning models, trained on both chemical and biological information, are capable of both cross-chemical and cross-species predictions. Here, we significantly extend these models' applicability. This was realized by increasing the quantity of training data by a factor of approximately 20, accomplished by considering both additional chemicals and aquatic organisms. Additionally, variable test durations and associated random effects were accommodated by employing a machine learning algorithm that combines tree-boosting with mixed-effects modeling (i.e., Gaussian Process Boosting). We also explored various biological descriptors including Dynamic Energy Budget model parameters, taxonomic distances, as well as genus-specific traits and investigated the inclusion of mode-of-action information. Through these efforts, we developed Bio-QSARs for fish and aquatic invertebrates with exceptional predictive power (R squared of up to 0.92 on independent test sets). Moreover, we made considerable strides to make models applicable for a range of use cases in environmental risk assessment as well as research and development of chemicals. Models were made fully explainable by implementing an algorithmic multicollinearity correction combined with SHapley Additive exPlanations. Furthermore, we devised novel approaches for applicability domain construction that take feature importance into account. We are hence confident these models, which are available via open access, will make a significant contribution towards the implementation of new approach methodologies and ultimately have the potential to support "Green Chemistry" and "Green Toxicology".
Collapse
Affiliation(s)
| | - Nika Galic
- Syngenta Crop Protection AG, 4058 Basel, Switzerland
| | | | | |
Collapse
|
4
|
Srithanyarat T, Taoma K, Sutthibutpong T, Ruengjitchatchawalya M, Liangruksa M, Laomettachit T. Interpreting drug synergy in breast cancer with deep learning using target-protein inhibition profiles. BioData Min 2024; 17:8. [PMID: 38424554 PMCID: PMC10905801 DOI: 10.1186/s13040-024-00359-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 02/23/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Breast cancer is the most common malignancy among women worldwide. Despite advances in treating breast cancer over the past decades, drug resistance and adverse effects remain challenging. Recent therapeutic progress has shifted toward using drug combinations for better treatment efficiency. However, with a growing number of potential small-molecule cancer inhibitors, in silico strategies to predict pharmacological synergy before experimental trials are required to compensate for time and cost restrictions. Many deep learning models have been previously proposed to predict the synergistic effects of drug combinations with high performance. However, these models heavily relied on a large number of drug chemical structural fingerprints as their main features, which made model interpretation a challenge. RESULTS This study developed a deep neural network model that predicts synergy between small-molecule pairs based on their inhibitory activities against 13 selected key proteins. The synergy prediction model achieved a Pearson correlation coefficient between model predictions and experimental data of 0.63 across five breast cancer cell lines. BT-549 and MCF-7 achieved the highest correlation of 0.67 when considering individual cell lines. Despite achieving a moderate correlation compared to previous deep learning models, our model offers a distinctive advantage in terms of interpretability. Using the inhibitory activities against key protein targets as the main features allowed a straightforward interpretation of the model since the individual features had direct biological meaning. By tracing the synergistic interactions of compounds through their target proteins, we gained insights into the patterns our model recognized as indicative of synergistic effects. CONCLUSIONS The framework employed in the present study lays the groundwork for future advancements, especially in model interpretation. By combining deep learning techniques and target-specific models, this study shed light on potential patterns of target-protein inhibition profiles that could be exploited in breast cancer treatment.
Collapse
Affiliation(s)
- Thanyawee Srithanyarat
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10150, Thailand
- School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10140, Thailand
| | - Kittisak Taoma
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10150, Thailand
- School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10140, Thailand
| | - Thana Sutthibutpong
- Department of Physics, Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok, 10140, Thailand
- Theoretical and Computational Physics Group, Center of Excellence in Theoretical and Computational Science, King Mongkut's University of Technology Thonburi, Bangkok, 10140, Thailand
| | - Marasri Ruengjitchatchawalya
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10150, Thailand
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10150, Thailand
| | - Monrudee Liangruksa
- National Nanotechnology Center (NANOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, 12120, Thailand.
| | - Teeraphan Laomettachit
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10150, Thailand.
- Theoretical and Computational Physics Group, Center of Excellence in Theoretical and Computational Science, King Mongkut's University of Technology Thonburi, Bangkok, 10140, Thailand.
| |
Collapse
|
5
|
Gurmessa DK, Jimma W. Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review. BMJ Health Care Inform 2024; 31:e100954. [PMID: 38307616 PMCID: PMC10840064 DOI: 10.1136/bmjhci-2023-100954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 01/21/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Breast cancer is the most common disease in women. Recently, explainable artificial intelligence (XAI) approaches have been dedicated to investigate breast cancer. An overwhelming study has been done on XAI for breast cancer. Therefore, this study aims to review an XAI for breast cancer diagnosis from mammography and ultrasound (US) images. We investigated how XAI methods for breast cancer diagnosis have been evaluated, the existing ethical challenges, research gaps, the XAI used and the relation between the accuracy and explainability of algorithms. METHODS In this work, Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and diagram were used. Peer-reviewed articles and conference proceedings from PubMed, IEEE Explore, ScienceDirect, Scopus and Google Scholar databases were searched. There is no stated date limit to filter the papers. The papers were searched on 19 September 2023, using various combinations of the search terms 'breast cancer', 'explainable', 'interpretable', 'machine learning', 'artificial intelligence' and 'XAI'. Rayyan online platform detected duplicates, inclusion and exclusion of papers. RESULTS This study identified 14 primary studies employing XAI for breast cancer diagnosis from mammography and US images. Out of the selected 14 studies, only 1 research evaluated humans' confidence in using the XAI system-additionally, 92.86% of identified papers identified dataset and dataset-related issues as research gaps and future direction. The result showed that further research and evaluation are needed to determine the most effective XAI method for breast cancer. CONCLUSION XAI is not conceded to increase users' and doctors' trust in the system. For the real-world application, effective and systematic evaluation of its trustworthiness in this scenario is lacking. PROSPERO REGISTRATION NUMBER CRD42023458665.
Collapse
Affiliation(s)
- Daraje Kaba Gurmessa
- Department of Information Science, Jimma Institute of Technology, Jimma University, Jimma, Oromia, Ethiopia
- Computer Science, Mattu University, Mattu, Oromīya, Ethiopia
| | - Worku Jimma
- Department of Information Science, Jimma Institute of Technology, Jimma University, Jimma, Oromia, Ethiopia
| |
Collapse
|
6
|
Li T, Liu Z, Thakkar S, Roberts R, Tong W. DeepAmes: A deep learning-powered Ames test predictive model with potential for regulatory application. Regul Toxicol Pharmacol 2023; 144:105486. [PMID: 37633327 DOI: 10.1016/j.yrtph.2023.105486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 07/14/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
The Ames assay is required by the regulatory agencies worldwide to assess the mutagenic potential risk of consumer products. As well as this in vitro assay, in silico approaches have been widely used to predict Ames test results as outlined in the International Council for Harmonization (ICH) guidelines. Building on this in silico approach, here we describe DeepAmes, a high performance and robust model developed with a novel deep learning (DL) approach for potential utility in regulatory science. DeepAmes was developed with a large and consistent Ames dataset (>10,000 compounds) and was compared with other five standard Machine Learning (ML) methods. Using a test set of 1,543 compounds, DeepAmes was the best performer in predicting the outcome of Ames assay. In addition, DeepAmes yielded the best and most stable performance up to when compounds were >30% outside of the applicability domain (AD). Regarding the potential for regulatory application, a revised version of DeepAmes with a much-improved sensitivity of 0.87 from 0.47. In conclusion, DeepAmes provides a DL-powered Ames test predictive model for predicting the results of Ames tests; with its defined AD and clear context of use, DeepAmes has potential for utility in regulatory application.
Collapse
Affiliation(s)
- Ting Li
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Zhichao Liu
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Shraddha Thakkar
- Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| | - Ruth Roberts
- ApconiX Ltd, Alderley Park, Alderley Edge, SK10 4TG, UK; University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA.
| |
Collapse
|
7
|
Zubrod JP, Galic N, Vaugeois M, Dreier DA. Physiological variables in machine learning QSARs allow for both cross-chemical and cross-species predictions. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 263:115250. [PMID: 37487435 DOI: 10.1016/j.ecoenv.2023.115250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 06/23/2023] [Accepted: 07/09/2023] [Indexed: 07/26/2023]
Abstract
A major challenge in ecological risk assessment is estimating chemical-induced effects across taxa without species-specific testing. Where ecotoxicological data may be more challenging to gather, information on species physiology is more available for a broad range of taxa. Physiology is known to drive species sensitivity but understanding about the relative contribution of specific underlying processes is still elusive. Consequently, there remains a need to understand which physiological processes lead to differences in species sensitivity. The objective of our study was to utilize existing knowledge about organismal physiology to both understand and predict differences in species sensitivity. Machine learning models were trained to predict chemical- and species-specific endpoints as a function of both chemical fingerprints/descriptors and physiological properties represented by dynamic energy budget (DEB) parameters. We found that random forest models were able to predict chemical- and species-specific endpoints, and that DEB parameters were relatively important in the models, particularly for invertebrates. Our approach illuminates how physiological properties may drive species sensitivity, which will allow more realistic predictions of effects across species without the need for additional animal testing.
Collapse
Affiliation(s)
| | - Nika Galic
- Syngenta Crop Protection AG, Basel, Switzerland
| | - Maxime Vaugeois
- Syngenta Crop Protection, LLC, Greensboro, NC, United States
| | - David A Dreier
- Syngenta Crop Protection, LLC, Greensboro, NC, United States.
| |
Collapse
|
8
|
Liu W, Wang Z, Chen J, Tang W, Wang H. Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics. Chem Res Toxicol 2023. [PMID: 37209109 DOI: 10.1021/acs.chemrestox.3c00074] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Machine learning (ML) models for screening endocrine-disrupting chemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR) agonists, are essential for sound management of chemicals. Previous models for screening TSHR agonists were built on imbalanced datasets and lacked applicability domain (AD) characterization essential for regulatory application. Herein, an updated TSHR agonist dataset was built, for which the ratio of active to inactive compounds greatly increased to 1:2.6, and chemical spaces of structure-activity landscapes (SALs) were enhanced. Resulting models based on 7 molecular representations and 4 ML algorithms were proven to outperform previous ones. Weighted similarity density (ρs) and weighted inconsistency of activities (IA) were proposed to characterize the SALs, and a state-of-the-art AD characterization methodology ADSAL{ρs, IA} was established. An optimal classifier developed with PubChem fingerprints and the random forest algorithm, coupled with ADSAL{ρs ≥ 0.15, IA ≤ 0.65}, exhibited good performance on the validation set with the area under the receiver operating characteristic curve being 0.984 and balanced accuracy being 0.941 and identified 90 TSHR agonist classes that could not be found previously. The classifier together with the ADSAL{ρs, IA} may serve as efficient tools for screening EDCs, and the AD characterization methodology may be applied to other ML models.
Collapse
Affiliation(s)
- Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
9
|
Patlewicz G, Paul-Friedman K, Houck K, Zhang L, Huang R, Xia M, Brown J, Simmons SO. Evaluating the utility of a high throughput thiol-containing fluorescent probe to screen for reactivity: A case study with the Tox21 library. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2023; 26:10.1016/j.comtox.2023.100271. [PMID: 37388277 PMCID: PMC10304587 DOI: 10.1016/j.comtox.2023.100271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
High-throughput screening (HTS) assays for bioactivity in the Tox21 program aim to evaluate an array of different biological targets and pathways, but a significant barrier to interpretation of these data is the lack of high-throughput screening (HTS) assays intended to identify non-specific reactive chemicals. This is an important aspect for prioritising chemicals to test in specific assays, identifying promiscuous chemicals based on their reactivity, as well as addressing hazards such as skin sensitisation which are not necessarily initiated by a receptor-mediated effect but act through a non-specific mechanism. Herein, a fluorescence-based HTS assay that allows the identification of thiol-reactive compounds was used to screen 7,872 unique chemicals in the Tox21 10K chemical library. Active chemicals were compared with profiling outcomes using structural alerts encoding electrophilic information. Random Forest classification models based on chemical fingerprints were developed to predict assay outcomes and evaluated through 10-fold stratified cross validation (CV). The mean CV Balanced Accuracy of the validation set was 0.648. The model developed shows promise as a tool to screen untested chemicals for their potential electrophilic reactivity based solely on chemical structural features.
Collapse
Affiliation(s)
- Grace Patlewicz
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Katie Paul-Friedman
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Keith Houck
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Li Zhang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Menghang Xia
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Jason Brown
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Steven O. Simmons
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| |
Collapse
|
10
|
Escher BI, Altenburger R, Blüher M, Colbourne JK, Ebinghaus R, Fantke P, Hein M, Köck W, Kümmerer K, Leipold S, Li X, Scheringer M, Scholz S, Schloter M, Schweizer PJ, Tal T, Tetko I, Traidl-Hoffmann C, Wick LY, Fenner K. Modernizing persistence-bioaccumulation-toxicity (PBT) assessment with high throughput animal-free methods. Arch Toxicol 2023; 97:1267-1283. [PMID: 36952002 PMCID: PMC10110678 DOI: 10.1007/s00204-023-03485-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 03/13/2023] [Indexed: 03/24/2023]
Abstract
The assessment of persistence (P), bioaccumulation (B), and toxicity (T) of a chemical is a crucial first step at ensuring chemical safety and is a cornerstone of the European Union's chemicals regulation REACH (Registration, Evaluation, Authorization, and Restriction of Chemicals). Existing methods for PBT assessment are overly complex and cumbersome, have produced incorrect conclusions, and rely heavily on animal-intensive testing. We explore how new-approach methodologies (NAMs) can overcome the limitations of current PBT assessment. We propose two innovative hazard indicators, termed cumulative toxicity equivalents (CTE) and persistent toxicity equivalents (PTE). Together they are intended to replace existing PBT indicators and can also accommodate the emerging concept of PMT (where M stands for mobility). The proposed "toxicity equivalents" can be measured with high throughput in vitro bioassays. CTE refers to the toxic effects measured directly in any given sample, including single chemicals, substitution products, or mixtures. PTE is the equivalent measure of cumulative toxicity equivalents measured after simulated environmental degradation of the sample. With an appropriate panel of animal-free or alternative in vitro bioassays, CTE and PTE comprise key environmental and human health hazard indicators. CTE and PTE do not require analytical identification of transformation products and mixture components but instead prompt two key questions: is the chemical or mixture toxic, and is this toxicity persistent or can it be attenuated by environmental degradation? Taken together, the proposed hazard indicators CTE and PTE have the potential to integrate P, B/M and T assessment into one high-throughput experimental workflow that sidesteps the need for analytical measurements and will support the Chemicals Strategy for Sustainability of the European Union.
Collapse
Affiliation(s)
- Beate I Escher
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany.
- Environmental Toxicology, Department of Geosciences, Eberhard Karls University Tübingen, Schnarrenbergstr. 94-96, E72076, Tübingen, Germany.
| | - Rolf Altenburger
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
| | - Matthias Blüher
- Helmholtz Institute for Metabolic, Obesity and Vascular Research (HI-MAG) of the Helmholtz Munich-German Research Centre for Environmental Health (GmbH) at the University of Leipzig and University Hospital Leipzig, Leipzig, Germany
| | - John K Colbourne
- Environmental Genomics Group, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Ralf Ebinghaus
- Institute of Coastal Environmental Chemistry, Helmholtz Zentrum Hereon, Max-Planck-Straße 1, 21502, Geesthacht, Germany
| | - Peter Fantke
- Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, Produktionstorvet 424, 2800, Kgs. Lyngby, Denmark
| | - Michaela Hein
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
| | - Wolfgang Köck
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
| | - Klaus Kümmerer
- Institute of Sustainable and Environmental Chemistry, Leuphana University Lüneburg, Universitätsallee 1, 21335, Lüneburg, Germany
- International Sustainable Chemistry Collaboration Centre (ISC3), Friedrich-Ebert-Allee 32 + 36, D-53113, Bonn, Germany
| | - Sina Leipold
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
- Department for Political Science, Friedrich-Schiller-University Jena, Bachstr. 18k, 07743, Jena, Germany
| | - Xiaojing Li
- Environmental Genomics Group, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Martin Scheringer
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, 8092, Zurich, Switzerland
| | - Stefan Scholz
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
| | - Michael Schloter
- Comparative Microbiome Analysis, Environmental Health Centre, Helmholtz Munich - German Research Centre for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764, Neuherberg, Germany
| | - Pia-Johanna Schweizer
- Research Institute for Sustainability-Helmholtz Centre Potsdam, Berliner Strasse 130, 14467, Potsdam, Germany
| | - Tamara Tal
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
| | - Igor Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Centre, Helmholtz Munich - German Research Centre for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764, Neuherberg, Germany
| | - Claudia Traidl-Hoffmann
- Environmental Medicine Faculty of Medicine, University of Augsburg, Stenglinstrasse 2, 86156, Augsburg, Germany
- Institute of Environmental Medicine, Environmental Health Centre, Helmholtz Munich - German Research Centre for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764, Neuherberg, Germany
| | - Lukas Y Wick
- Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, E04318, Leipzig, Germany
| | - Kathrin Fenner
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600, Dübendorf, Switzerland
- Department of Chemistry, University of Zürich, 8057, Zurich, Switzerland
| |
Collapse
|
11
|
Molecular Property Prediction by Combining LSTM and GAT. Biomolecules 2023; 13:biom13030503. [PMID: 36979438 PMCID: PMC10046625 DOI: 10.3390/biom13030503] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 02/10/2023] [Accepted: 03/06/2023] [Indexed: 03/12/2023] Open
Abstract
Molecular property prediction is an important direction in computer-aided drug design. In this paper, to fully explore the information from SMILE stings and graph data of molecules, we combined the SALSTM and GAT methods in order to mine the feature information of molecules from sequences and graphs. The embedding atoms are obtained through SALSTM, firstly using SMILES strings, and they are combined with graph node features and fed into the GAT to extract the global molecular representation. At the same time, data augmentation is added to enlarge the training dataset and improve the performance of the model. Finally, to enhance the interpretability of the model, the attention layers of both models are fused together to highlight the key atoms. Comparison with other graph-based and sequence-based methods, for multiple datasets, shows that our method can achieve high prediction accuracy with good generalizability.
Collapse
|
12
|
John L, Mahanta HJ, Soujanya Y, Sastry GN. Assessing machine learning approaches for predicting failures of investigational drug candidates during clinical trials. Comput Biol Med 2023; 153:106494. [PMID: 36587568 DOI: 10.1016/j.compbiomed.2022.106494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/30/2022] [Accepted: 12/27/2022] [Indexed: 12/30/2022]
Abstract
One of the major challenges in drug development is having acceptable levels of efficacy and safety throughout all the phases of clinical trials followed by the successful launch in the market. While there are many factors such as molecular properties, toxicity parameters, mechanism of action at the target site, etc. that regulates the therapeutic action of a compound, a holistic approach directed towards data-driven studies will invariably strengthen the predictive toxicological sciences. Our quest for the current study is to find out various reasons as to why an investigational candidate would fail in the clinical trials after multiple iterations of refinement and optimization. We have compiled a dataset that comprises of approved and withdrawn drugs as well as toxic compounds and essentially have used time-split based approach to generate the training and validation set. Five highly robust and scalable machine learning binary classifiers were used to develop the predictive models that were trained with features like molecular descriptors and fingerprints and then validated rigorously to achieve acceptable performance in terms of a set of performance metrics. The mean AUC scores for all the five classifiers with the hold-out test set were obtained in the range of 0.66-0.71. The models were further used to predict the probability score for the clinical candidate dataset. The top compounds predicted to be toxic were analyzed to estimate different dimensions of toxicity. Apparently, through this study, we propose that with the appropriate use of feature extraction and machine learning methods, one can estimate the likelihood of success or failure of investigational drugs candidates thereby opening an avenue for future trends in computational toxicological studies. The models developed in the study can be accessed at https://github.com/gnsastry/predicting_clinical_trials.git.
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Y Soujanya
- Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.
| |
Collapse
|
13
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
14
|
Hasannejadasl H, Osong B, Bermejo I, van der Poel H, Vanneste B, van Roermund J, Aben K, Zhang Z, Kiemeney L, Van Oort I, Verwey R, Hochstenbach L, Bloemen E, Dekker A, Fijten RRR. A comparison of machine learning models for predicting urinary incontinence in men with localized prostate cancer. Front Oncol 2023; 13:1168219. [PMID: 37124522 PMCID: PMC10130634 DOI: 10.3389/fonc.2023.1168219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 03/13/2023] [Indexed: 05/02/2023] Open
Abstract
Introduction Urinary incontinence (UI) is a common side effect of prostate cancer treatment, but in clinical practice, it is difficult to predict. Machine learning (ML) models have shown promising results in predicting outcomes, yet the lack of transparency in complex models known as "black-box" has made clinicians wary of relying on them in sensitive decisions. Therefore, finding a balance between accuracy and explainability is crucial for the implementation of ML models. The aim of this study was to employ three different ML classifiers to predict the probability of experiencing UI in men with localized prostate cancer 1-year and 2-year after treatment and compare their accuracy and explainability. Methods We used the ProZIB dataset from the Netherlands Comprehensive Cancer Organization (Integraal Kankercentrum Nederland; IKNL) which contained clinical, demographic, and PROM data of 964 patients from 65 Dutch hospitals. Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) algorithms were applied to predict (in)continence after prostate cancer treatment. Results All models have been externally validated according to the TRIPOD Type 3 guidelines and their performance was assessed by accuracy, sensitivity, specificity, and AUC. While all three models demonstrated similar performance, LR showed slightly better accuracy than RF and SVM in predicting the risk of UI one year after prostate cancer treatment, achieving an accuracy of 0.75, a sensitivity of 0.82, and an AUC of 0.79. All models for the 2-year outcome performed poorly in the validation set, with an accuracy of 0.6 for LR, 0.65 for RF, and 0.54 for SVM. Conclusion The outcomes of our study demonstrate the promise of using non-black box models, such as LR, to assist clinicians in recognizing high-risk patients and making informed treatment choices. The coefficients of the LR model show the importance of each feature in predicting results, and the generated nomogram provides an accessible illustration of how each feature impacts the predicted outcome. Additionally, the model's simplicity and interpretability make it a more appropriate option in scenarios where comprehending the model's predictions is essential.
Collapse
Affiliation(s)
- Hajar Hasannejadasl
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
| | - Biche Osong
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
| | - Inigo Bermejo
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
| | - Henk van der Poel
- Department of Urology, Netherlands Cancer Institute, Amsterdam, and Amsterdam University Medical Centers, Amsterdam, Netherlands
| | - Ben Vanneste
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
- Department of Human Structure and Repair, Department of Radiation Oncology, Ghent University Hospital, Ghent, Belgium
| | - Joep van Roermund
- Department of Urology, Maastricht University Medical Center, Maastricht, Netherlands
| | - Katja Aben
- Department of Research and Development, Netherlands Comprehensive Cancer Organization, Utrecht, Netherlands
- Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Zhen Zhang
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
| | - Lambertus Kiemeney
- Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Inge Van Oort
- Department of Urology, Radboud University Medical Center, Nijmegen, Netherlands
| | - Renee Verwey
- Center of Expertise for Innovative Care and Technology (EIZT), School of Nursing, Zuyd University of Applied Sciences, Heerlen, Netherlands
| | - Laura Hochstenbach
- Center of Expertise for Innovative Care and Technology (EIZT), School of Nursing, Zuyd University of Applied Sciences, Heerlen, Netherlands
| | - Esther Bloemen
- Center of Expertise for Innovative Care and Technology (EIZT), School of Nursing, Zuyd University of Applied Sciences, Heerlen, Netherlands
- Expertise Center Empowering Healthy Behavior, Fontys University of Applied Sciences, Eindhoven, Netherlands
| | - Andre Dekker
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
| | - Rianne R. R. Fijten
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, Netherlands
- *Correspondence: Rianne R. R. Fijten,
| |
Collapse
|
15
|
Bifarin OO. Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification. PLoS One 2023; 18:e0284315. [PMID: 37141218 PMCID: PMC10159207 DOI: 10.1371/journal.pone.0284315] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 03/28/2023] [Indexed: 05/05/2023] Open
Abstract
Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model's interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA's VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
Collapse
Affiliation(s)
- Olatomiwa O Bifarin
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, United States of America
| |
Collapse
|
16
|
Luo Y, Cuneo KC, Lawrence TS, Matuszak MM, Dawson LA, Niraula D, Ten Haken RK, El Naqa I. A human-in-the-loop based Bayesian network approach to improve imbalanced radiation outcomes prediction for hepatocellular cancer patients with stereotactic body radiotherapy. Front Oncol 2022; 12:1061024. [PMID: 36568208 PMCID: PMC9782976 DOI: 10.3389/fonc.2022.1061024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/01/2022] [Indexed: 12/13/2022] Open
Abstract
Background Imbalanced outcome is one of common characteristics of oncology datasets. Current machine learning approaches have limitation in learning from such datasets. Here, we propose to resolve this problem by utilizing a human-in-the-loop (HITL) approach, which we hypothesize will also lead to more accurate and explainable outcome prediction models. Methods A total of 119 HCC patients with 163 tumors were used in the study. 81 patients with 104 tumors from the University of Michigan Hospital treated with SBRT were considered as a discovery dataset for radiation outcomes model building. The external testing dataset included 59 tumors from 38 patients with SBRT from Princess Margaret Hospital. In the discovery dataset, 100 tumors from 77 patients had local control (LC) (96% of 104 tumors) and 23 patients had at least one grade increment of ALBI (I-ALBI) during six-month follow up (28% of 81 patients). Each patient had a total of 110 features, where 15 or 20 features were identified by physicians as expert knowledge features (EKFs) for LC or I-ALBI prediction. We proposed a HITL based Bayesian network (HITL-BN) approach to enhance the capability of selecting important features from imbalanced data in terms of accuracy and explainability through humans' participation by integrating feature importance ranking and Markov blanket algorithms. A pure data-driven Bayesian network (PD-BN) method was applied to the same discovery dataset of HCC patients as a benchmark. Results In the training and testing phases, the areas under receiver operating characteristic curves of the HITL-BN models for LC or I-ALBI prediction during SBRT are 0.85 (95% confidence interval: 0.75-0.95) or 0.89 (0.81-0.95) and 0.77 or 0.78, respectively. They significantly outperformed the during-treatment PD-BN model in predicting LC or I-ALBI based on the discovery cross-validation and testing datasets from the Delong tests. Conclusion By allowing the human expert to be part of the model building process, the HITL-BN approach yielded significantly improved accuracy as well as better explainability when dealing with imbalanced outcomes in the prediction of post-SBRT treatment response of HCC patients when compared to the PD-BN method.
Collapse
Affiliation(s)
- Yi Luo
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States,*Correspondence: Yi Luo,
| | - Kyle C. Cuneo
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, United States
| | - Theodore S. Lawrence
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, United States
| | - Martha M. Matuszak
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, United States
| | - Laura A. Dawson
- Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada
| | - Dipesh Niraula
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| | - Randall K. Ten Haken
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, United States
| | - Issam El Naqa
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| |
Collapse
|
17
|
Chen P, Wang R, Chen G, An B, Liu M, Wang Q, Tao Y. Thyroid endocrine disruption and hepatotoxicity induced by bisphenol AF: Integrated zebrafish embryotoxicity test and deep learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 822:153639. [PMID: 35131240 DOI: 10.1016/j.scitotenv.2022.153639] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 01/28/2022] [Accepted: 01/29/2022] [Indexed: 06/14/2023]
Abstract
Bisphenol AF (BPAF) is an emerging contaminant prevalent in the environment as one of main substitutes of bisphenol A (BPA). It was found that BPAF exhibited estrogenic effects in zebrafish larvae in our previous study, while little is known about its effects on the thyroid and liver. A 7 d zebrafish embryotoxicity test was conducted to study the potential thyroid disruption and hepatotoxicity of BPAF. BPAF decreased levels of thyroid hormones and deiodinases but increased expressions of transthyretin at 12.5 and 125 μg/L after 7 d exposure, indicating that both the metabolism and transport of thyroid hormones were perturbed. The thyroid hormone receptor (TR) levels decreased significantly upon exposure to ≥12.5 μg/L BPAF, implying that BPAF acts as a TR antagonist, which coincided well with the prediction from the Direct Message Passing Neural Network. The liver impairment (mainly cell necrosis of hepatocytes) and apoptosis were triggered by 125 μg/L and ≥12.5 μg/L BPAF respectively, accompanied by the increased activities of caspase 3 and caspase 9. Thus BPAF might not be a safe alternative to BPA given the thyroid and liver toxicity. DMPNN appears useful to screen for thyroid disrupting activity from molecular structures.
Collapse
Affiliation(s)
- Pengyu Chen
- College of Oceanography, Hohai University, Nanjing 210024, China
| | - Ruihan Wang
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Geng Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Baihui An
- College of Oceanography, Hohai University, Nanjing 210024, China
| | - Ming Liu
- College of Oceanography, Hohai University, Nanjing 210024, China
| | - Qiang Wang
- Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
| | - Yuqiang Tao
- College of Oceanography, Hohai University, Nanjing 210024, China.
| |
Collapse
|
18
|
Jeong K, Lee JY, Woo S, Kim D, Jeon Y, Ryu TI, Hwang SR, Jeong WH. Vapor Pressure and Toxicity Prediction for Novichok Agent Candidates Using Machine Learning Model: Preparation for Unascertained Nerve Agents after Chemical Weapons Convention Schedule 1 Update. Chem Res Toxicol 2022; 35:774-781. [PMID: 35317551 DOI: 10.1021/acs.chemrestox.1c00410] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The recent terrorist attacks using Novichok agents and subsequent operations have necessitated an understanding of its physicochemical properties, such as vapor pressure and toxicity, as well as unascertained nerve agent structures. To prevent continued threats from new types of nerve agents, the organization for the prohibition of chemical weapons (OPCW) updated the chemical weapons convention (CWC) schedule 1 list. However, this information is vague and may encompass more than 10 000 possible chemical structures, which makes it almost impossible to synthesize and measure their properties and toxicity. To assist this effort, we successfully developed machine learning (ML) models to predict the vapor pressure to help with escape and removal operations. The model shows robust and high-accuracy performance with promising features for predicting vapor pressure when applied to Novichok materials and accurate predictions with reasonable errors. The ML classification model was successfully built for the swallow globally harmonized system class of organophosphorus compounds (OP) for toxicity predictions. The tuned ML model was used to predict the toxicity of Novichok agents, as described in the CWC list. Although its accuracy and linearity can be improved, this ML model is expected to be a firm basis for developing more accurate models for predicting the vapor pressure and toxicity of nerve agents in the future to help handle future terror attacks with unknown nerve agents.
Collapse
Affiliation(s)
- Keunhong Jeong
- Department of Chemistry, Korea Military Academy, Seoul 01805, South Korea
| | - Jin-Young Lee
- Agency for Defense Development (ADD), P.O. Box 35, Yuseong-gu, Daejeon 34186, South Korea
| | - Seungmin Woo
- Department of Nuclear and Energy Engineering, Jeju National University, Jeju, 63243, South Korea
| | - Dongwoo Kim
- Department of Chemistry, Korea Military Academy, Seoul 01805, South Korea
| | - Yonggoon Jeon
- Department of Chemistry, Korea Military Academy, Seoul 01805, South Korea
| | - Tae In Ryu
- Accident Coordination and Training Division, National Institute of Chemical Safety (NICS), 90 Gajeongbuk-rO, Yuseong-gu, Daejeon 34114, South Korea
| | - Seung-Ryul Hwang
- Accident Coordination and Training Division, National Institute of Chemical Safety (NICS), 90 Gajeongbuk-rO, Yuseong-gu, Daejeon 34114, South Korea
| | - Woo-Hyeon Jeong
- Agency for Defense Development (ADD), P.O. Box 35, Yuseong-gu, Daejeon 34186, South Korea
| |
Collapse
|
19
|
Liu X, Lu D, Zhang A, Liu Q, Jiang G. Data-Driven Machine Learning in Environmental Pollution: Gains and Problems. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:2124-2133. [PMID: 35084840 DOI: 10.1021/acs.est.1c06157] [Citation(s) in RCA: 64] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The complexity and dynamics of the environment make it extremely difficult to directly predict and trace the temporal and spatial changes in pollution. In the past decade, the unprecedented accumulation of data, the development of high-performance computing power, and the rise of diverse machine learning (ML) methods provide new opportunities for environmental pollution research. The ML methodology has been used in satellite data processing to obtain ground-level concentrations of atmospheric pollutants, pollution source apportionment, and spatial distribution modeling of water pollutants. However, unlike the active practices of ML in chemical toxicity prediction, advanced algorithms such as deep neural networks in environmental process studies of pollutants are still deficient. In addition, over 40% of the environmental applications of ML go to air pollution, and its application range and acceptance in other aspects of environmental science remain to be increased. The use of ML methods to revolutionize environmental science and its problem-solving scenarios has its own challenges. Several issues should be taken into consideration, such as the tradeoff between model performance and interpretability, prerequisites of the machine learning model, model selection, and data sharing.
Collapse
Affiliation(s)
- Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Dawei Lu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China
| | - Qian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China
| |
Collapse
|
20
|
Guan S, Fu N. Class imbalance learning with Bayesian optimization applied in drug discovery. Sci Rep 2022; 12:2069. [PMID: 35136094 PMCID: PMC8827090 DOI: 10.1038/s41598-022-05717-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/11/2022] [Indexed: 11/12/2022] Open
Abstract
Machine intelligence (MI), including machine learning and deep learning, have been regarded as promising methods to reduce the prohibitively high cost of drug development. However, a dilemma within MI has limited its wide application: machine learning models are easier to interpret but yield worse predictive performance than deep learning models. Therefore, we propose a pipeline called Class Imbalance Learning with Bayesian Optimization (CILBO) to improve the performance of machine learning models in drug discovery. To demonstrate the efficacy of the CILBO pipeline, we developed an example model to predict antibacterial candidates. Comparison of the antibacterial prediction performance between our model and a well-known deep learning model published by Stokes et al. suggests that our model can perform as well as the deep learning model in drug activity prediction. The CILBO pipeline we propose provides a simple, alternative approach to accelerate preliminary screenings and decrease the cost of drug discovery.
Collapse
Affiliation(s)
- Shenmin Guan
- Shanghai GenomSeqCare Biotechnology Co. Ltd., Shanghai, 200052, China.
| | - Ning Fu
- Shanghai GenomSeqCare Biotechnology Co. Ltd., Shanghai, 200052, China
| |
Collapse
|
21
|
Hao Y, Moore JH. TargetTox: A Feature Selection Pipeline for Identifying Predictive Targets Associated with Drug Toxicity. J Chem Inf Model 2021; 61:5386-5394. [PMID: 34757743 DOI: 10.1021/acs.jcim.1c00733] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In silico assessment of drug toxicity is becoming a critical step in drug development. Conventional ligand-based models are limited by low accuracy and lack of interpretability. Further, they often fail to explain cellular mechanisms underlying structure-toxicity associations. We addressed these limitations by incorporating target profile as an intermediate connecting structure to toxicity. To accommodate for high-dimensional feature space, we developed a pipeline named TargetTox that can identity a subset of predictive features. We implemented TargetTox to study 569 targets and 815 adverse events. The features identified by TargetTox comprise less than 10% of the original feature space; nevertheless, they accurately predicted binding outcomes for 377 targets and toxicity outcomes for 36 adverse events. We demonstrated that predictive targets tend to be differentially expressed in the tissue of toxicity. We also rediscovered key cellular functions associated with cardiotoxicity from the predictive targets, as well as markers of skin and liver diseases. Furthermore, we found evidence supporting diagnostic and therapeutic applications of some predictive targets in hepatotoxicity and nephrotoxicity. Our findings highlighted the critical role of predictive targets in cellular mechanisms leading to toxicity. In general, our study improved the interpretability of toxicity prediction without sacrificing accuracy. Our novel pipeline may benefit future studies of high-dimensional data sets.
Collapse
Affiliation(s)
- Yun Hao
- Genomics and Computational Biology (GCB) Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
22
|
Fuhrman JD, Gorre N, Hu Q, Li H, El Naqa I, Giger ML. A review of explainable and interpretable AI with applications in COVID-19 imaging. Med Phys 2021; 49:1-14. [PMID: 34796530 PMCID: PMC8646613 DOI: 10.1002/mp.15359] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 10/14/2021] [Accepted: 10/25/2021] [Indexed: 12/24/2022] Open
Abstract
The development of medical imaging artificial intelligence (AI) systems for evaluating COVID‐19 patients has demonstrated potential for improving clinical decision making and assessing patient outcomes during the recent COVID‐19 pandemic. These have been applied to many medical imaging tasks, including disease diagnosis and patient prognosis, as well as augmented other clinical measurements to better inform treatment decisions. Because these systems are used in life‐or‐death decisions, clinical implementation relies on user trust in the AI output. This has caused many developers to utilize explainability techniques in an attempt to help a user understand when an AI algorithm is likely to succeed as well as which cases may be problematic for automatic assessment, thus increasing the potential for rapid clinical translation. AI application to COVID‐19 has been marred with controversy recently. This review discusses several aspects of explainable and interpretable AI as it pertains to the evaluation of COVID‐19 disease and it can restore trust in AI application to this disease. This includes the identification of common tasks that are relevant to explainable medical imaging AI, an overview of several modern approaches for producing explainable output as appropriate for a given imaging scenario, a discussion of how to evaluate explainable AI, and recommendations for best practices in explainable/interpretable AI implementation. This review will allow developers of AI systems for COVID‐19 to quickly understand the basics of several explainable AI techniques and assist in the selection of an approach that is both appropriate and effective for a given scenario.
Collapse
Affiliation(s)
- Jordan D Fuhrman
- Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.,Department of Radiology, The University of Chicago, Chicago, Illinois, USA
| | - Naveena Gorre
- Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.,Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida, USA
| | - Qiyuan Hu
- Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.,Department of Radiology, The University of Chicago, Chicago, Illinois, USA
| | - Hui Li
- Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.,Department of Radiology, The University of Chicago, Chicago, Illinois, USA
| | - Issam El Naqa
- Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.,Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida, USA
| | - Maryellen L Giger
- Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.,Department of Radiology, The University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
23
|
Kleinstreuer NC, Tetko IV, Tong W. Introduction to Special Issue: Computational Toxicology. Chem Res Toxicol 2021; 34:171-175. [PMID: 33583184 DOI: 10.1021/acs.chemrestox.1c00032] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|