1
|
Arturi K, Harris EJ, Gasser L, Escher BI, Braun G, Bosshard R, Hollender J. MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data. J Cheminform 2025; 17:14. [PMID: 39891244 PMCID: PMC11786476 DOI: 10.1186/s13321-025-00950-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 01/06/2025] [Indexed: 02/03/2025] Open
Abstract
MLinvitroTox is an automated Python pipeline developed for high-throughput hazard-driven prioritization of toxicologically relevant signals detected in complex environmental samples through high-resolution tandem mass spectrometry (HRMS/MS). MLinvitroTox is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and target-specific endpoints from the ToxCast/Tox21 invitroDBv4.1 database. For each analyzed HRMS feature, MLinvitroTox generates a 490-bit bioactivity fingerprint used as a basis for prioritization, focusing the time-consuming molecular identification efforts on features most likely to cause adverse effects. The practical advantages of MLinvitroTox are demonstrated for groundwater HRMS data. Among the 874 features for which molecular fingerprints were derived from spectra, including 630 nontargets, 185 spectral matches, and 59 targets, around 4% of the feature/endpoint relationship pairs were predicted to be active. Cross-checking the predictions for targets and spectral matches with invitroDB data confirmed the bioactivity of 120 active and 6791 nonactive pairs while mislabeling 88 active and 56 non-active relationships. By filtering according to bioactivity probability, endpoint scores, and similarity to the training data, the number of potentially toxic features was reduced by at least one order of magnitude. This refinement makes the analytical confirmation of the toxicologically most relevant features feasible, offering significant benefits for cost-efficient chemical risk assessment.Scientific Contribution:In contrast to the classical ML-based approaches for toxicity prediction, MLinvitroTox predicts bioactivity for HRMS features (i.e., distinct m/z signals) based on MS2 fragmentation spectra rather than the chemical structures from the identified features. While the original proof of concept study was accompanied by the release of a MLinvitroTox v1 KNIME workflow, in this study, we release a Python MLinvitroTox v2 package, which, in addition to automation, expands functionality to include predicting toxicity from structures, cleaning up and generating chemical fingerprints, customizing models, and retraining on custom data. Furthermore, as a result of improvements in bioactivity data processing, realized in the concurrently released pytcpl Python package for the custom processing of invitroDBv4.1 input data used for training MLinvitroTox, the current release introduces enhancements in model accuracy, coverage of biological mechanistic targets, and overall interpretability.
Collapse
Affiliation(s)
- Katarzyna Arturi
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Überlandstrasse 133, 8600, Dübendorf, Switzerland.
| | - Eliza J Harris
- Swiss Data Science Center (SDSC), Andreasstrasse 5, 8092, Zürich, Switzerland
- Now at: Climate and Environmental Physics Division, University of Bern, Sidlerstrasse 5, 3012, Bern, Switzerland
| | - Lilian Gasser
- Swiss Data Science Center (SDSC), Andreasstrasse 5, 8092, Zürich, Switzerland
| | - Beate I Escher
- Cell Toxicology, Helmholtz Centre for Environmental Research (UFZ), Permoserstr. 15, 04318, Leipzig, Germany
| | - Georg Braun
- Cell Toxicology, Helmholtz Centre for Environmental Research (UFZ), Permoserstr. 15, 04318, Leipzig, Germany
| | - Robin Bosshard
- Department of Computer Science, Eidgenössische Technische Hochschule Zürich (ETH Zürich), Universitätstrasse 6, 8092, Zürich, Switzerland
| | - Juliane Hollender
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Überlandstrasse 133, 8600, Dübendorf, Switzerland.
- Institute of Biogeochemistry and Pollution Dynamics, Eidgenössische Technische Hochschule Zürich (ETH Zürich), Rämistrasse 101, 8092, Zürich, Switzerland.
| |
Collapse
|
2
|
Alvarez-Mora I, Arturi K, Béen F, Buchinger S, El Mais AER, Gallampois C, Hahn M, Hollender J, Houtman C, Johann S, Krauss M, Lamoree M, Margalef M, Massei R, Brack W, Muz M. Progress, applications, and challenges in high-throughput effect-directed analysis for toxicity driver identification - is it time for HT-EDA? Anal Bioanal Chem 2025; 417:451-472. [PMID: 38992177 DOI: 10.1007/s00216-024-05424-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]
Abstract
The rapid increase in the production and global use of chemicals and their mixtures has raised concerns about their potential impact on human and environmental health. With advances in analytical techniques, in particular, high-resolution mass spectrometry (HRMS), thousands of compounds and transformation products with potential adverse effects can now be detected in environmental samples. However, identifying and prioritizing the toxicity drivers among these compounds remain a significant challenge. Effect-directed analysis (EDA) emerged as an important tool to address this challenge, combining biotesting, sample fractionation, and chemical analysis to unravel toxicity drivers in complex mixtures. Traditional EDA workflows are labor-intensive and time-consuming, hindering large-scale applications. The concept of high-throughput (HT) EDA has recently gained traction as a means of accelerating these workflows. Key features of HT-EDA include the combination of microfractionation and downscaled bioassays, automation of sample preparation and biotesting, and efficient data processing workflows supported by novel computational tools. In addition to microplate-based fractionation, high-performance thin-layer chromatography (HPTLC) offers an interesting alternative to HPLC in HT-EDA. This review provides an updated perspective on the state-of-the-art in HT-EDA, and novel methods/tools that can be incorporated into HT-EDA workflows. It also discusses recent studies on HT-EDA, HT bioassays, and computational prioritization tools, along with considerations regarding HPTLC. By identifying current gaps in HT-EDA and proposing new approaches to overcome them, this review aims to bring HT-EDA a step closer to monitoring applications.
Collapse
Affiliation(s)
- Iker Alvarez-Mora
- Department of Exposure Science, Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany.
- Research Centre for Experimental Marine Biology and Biotechnology (PIE), University of the Basque Country (UPV/EHU), Plentzia, Basque Country, Spain.
| | - Katarzyna Arturi
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
| | - Frederic Béen
- KWR Water Research Institute, Nieuwegein, the Netherlands
- Chemistry for Environment and Health, Amsterdam Institute for Life and Environment (A-LIFE), Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Sebastian Buchinger
- Department of Biochemistry and Ecotoxicology, Federal Institute of Hydrology (BfG), Koblenz, Germany
| | | | | | - Meike Hahn
- Department of Biochemistry and Ecotoxicology, Federal Institute of Hydrology (BfG), Koblenz, Germany
| | - Juliane Hollender
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, Zürich, Switzerland
| | - Corine Houtman
- Chemistry for Environment and Health, Amsterdam Institute for Life and Environment (A-LIFE), Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- The Water Laboratory, Haarlem, the Netherlands
| | - Sarah Johann
- Department of Evolutionary Ecology and Environmental Toxicology, Goethe University Frankfurt, Frankfurt Am Main, Germany
| | - Martin Krauss
- Department of Exposure Science, Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany
| | - Marja Lamoree
- Chemistry for Environment and Health, Amsterdam Institute for Life and Environment (A-LIFE), Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Maria Margalef
- Chemistry for Environment and Health, Amsterdam Institute for Life and Environment (A-LIFE), Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Riccardo Massei
- Department of Monitoring and Exploration Technologies, Research Data Management Team (RDM), Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany
- Department of Ecotoxicology, Group of Integrative Toxicology (iTox), Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany
| | - Werner Brack
- Department of Exposure Science, Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany
- Department of Evolutionary Ecology and Environmental Toxicology, Goethe University Frankfurt, Frankfurt Am Main, Germany
| | - Melis Muz
- Department of Exposure Science, Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany
| |
Collapse
|
3
|
Gustavsson M, Käll S, Svedberg P, Inda-Diaz JS, Molander S, Coria J, Backhaus T, Kristiansson E. Transformers enable accurate prediction of acute and chronic chemical toxicity in aquatic organisms. SCIENCE ADVANCES 2024; 10:eadk6669. [PMID: 38446886 PMCID: PMC10917336 DOI: 10.1126/sciadv.adk6669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/30/2024] [Indexed: 03/08/2024]
Abstract
Environmental hazard assessments are reliant on toxicity data that cover multiple organism groups. Generating experimental toxicity data is, however, resource-intensive and time-consuming. Computational methods are fast and cost-efficient alternatives, but the low accuracy and narrow applicability domains have made their adaptation slow. Here, we present a AI-based model for predicting chemical toxicity. The model uses transformers to capture toxicity-specific features directly from the chemical structures and deep neural networks to predict effect concentrations. The model showed high predictive performance for all tested organism groups-algae, aquatic invertebrates and fish-and has, in comparison to commonly used QSAR methods, a larger applicability domain and a considerably lower error. When the model was trained on data with multiple effect concentrations (EC50/EC10), the performance was further improved. We conclude that deep learning and transformers have the potential to markedly advance computational prediction of chemical toxicity.
Collapse
Affiliation(s)
- Mikael Gustavsson
- Department of Economics, University of Gothenburg, Gothenburg, Sweden
| | - Styrbjörn Käll
- Department of Mathematical Sciences, Chalmers University of Technology/University of Gothenburg, Gothenburg, Sweden
| | - Patrik Svedberg
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Juan S. Inda-Diaz
- Department of Mathematical Sciences, Chalmers University of Technology/University of Gothenburg, Gothenburg, Sweden
| | - Sverker Molander
- Division of Environmental Systems Analysis, Department of Technology Management and Economics, Chalmers University of Technology, Gothenburg, Sweden
| | - Jessica Coria
- Department of Economics, University of Gothenburg, Gothenburg, Sweden
| | - Thomas Backhaus
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology/University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
4
|
Soulios K, Scheibe P, Bernt M, Hackermüller J, Schor J. deepFPlearn +: enhancing toxicity prediction across the chemical universe using graph neural networks. Bioinformatics 2023; 39:btad713. [PMID: 38011648 PMCID: PMC10724847 DOI: 10.1093/bioinformatics/btad713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/06/2023] [Accepted: 11/26/2023] [Indexed: 11/29/2023] Open
Abstract
SUMMARY Sophisticated approaches for the in silico prediction of toxicity are required to support the risk assessment of chemicals. The number of chemicals on the global chemical market and the speed of chemical innovation stand in massive contrast to the capacity for regularizing chemical use. We recently proved our ready-to-use application deepFPlearn as a suitable approach for this task. Here, we present its extension deepFPlearn+ incorporating (i) a graph neural network to feed our AI with a more sophisticated molecular structure representation and (ii) alternative train-test splitting strategies that involve scaffold structures and the molecular weights of chemicals. We show that the GNNs outperform the previous model substantially and that our models can generalize on unseen data even with a more robust and challenging test set. Therefore, we highly recommend the application of deepFPlearn+ on the chemical inventory to prioritize chemicals for experimental testing or any chemical subset of interest in monitoring studies. AVAILABILITY AND IMPLEMENTATION The software is compatible with python 3.6 or higher, and the source code can be found on our GitHub repository: https://github.com/yigbt/deepFPlearn. The data underlying this article are available in Zenodo, and can be accessed with the link below: https://zenodo.org/record/8146252. Detailed installation guides via Docker, Singularity, and Conda are provided within the repository for operability across all operating systems.
Collapse
Affiliation(s)
- Kyriakos Soulios
- Department of Computation Biology, Helmholtz Centre for Environmental Research – UFZ, 04318 Leipzig, Germany
- Department of Computer Science, Faculty of Mathematics and Computer Science, University of Leipzig, 04109 Leipzig, Germany
| | - Patrick Scheibe
- Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Saxony, Germany
| | - Matthias Bernt
- Department of Computation Biology, Helmholtz Centre for Environmental Research – UFZ, 04318 Leipzig, Germany
| | - Jörg Hackermüller
- Department of Computation Biology, Helmholtz Centre for Environmental Research – UFZ, 04318 Leipzig, Germany
- Department of Computer Science, Faculty of Mathematics and Computer Science, University of Leipzig, 04109 Leipzig, Germany
| | - Jana Schor
- Department of Computation Biology, Helmholtz Centre for Environmental Research – UFZ, 04318 Leipzig, Germany
| |
Collapse
|