1
|
Canchola A, Tran LN, Woo W, Tian L, Lin YH, Chou WC. Advancing non-target analysis of emerging environmental contaminants with machine learning: Current status and future implications. ENVIRONMENT INTERNATIONAL 2025; 198:109404. [PMID: 40139034 DOI: 10.1016/j.envint.2025.109404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 03/03/2025] [Accepted: 03/18/2025] [Indexed: 03/29/2025]
Abstract
Emerging environmental contaminants (EECs) such as pharmaceuticals, pesticides, and industrial chemicals pose significant challenges for detection and identification due to their structural diversity and lack of analytical standards. Traditional targeted screening methods often fail to detect these compounds, making non-target analysis (NTA) using high-resolution mass spectrometry (HRMS) essential for identifying unknown or suspected contaminants. However, interpreting the vast datasets generated by HRMS is complex and requires advanced data processing techniques. Recent advancements in machine learning (ML) models offer great potential for enhancing NTA applications. As such, we reviewed key developments, including optimizing workflows using computational tools, improved chemical structure identification, advanced quantification methods, and enhanced toxicity prediction capabilities. It also discusses challenges and future perspectives in the field, such as refining ML tools for complex mixtures, improving inter-laboratory validation, and further integrating computational models into environmental risk assessment frameworks. By addressing these challenges, ML-assisted NTA can significantly enhance the detection, quantification, and evaluation of EECs, ultimately contributing to more effective environmental monitoring and public health protection.
Collapse
Affiliation(s)
- Alexa Canchola
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States; Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States
| | - Lillian N Tran
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States
| | - Wonsik Woo
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States
| | - Linhui Tian
- Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States
| | - Ying-Hsuan Lin
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States; Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States.
| | - Wei-Chun Chou
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States; Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States.
| |
Collapse
|
2
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2025; 417:473-493. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
3
|
Abrahamsson D, Koronaiou LA, Johnson T, Yang J, Ji X, Lambropoulou DA. Modeling the relative response factor of small molecules in positive electrospray ionization. RSC Adv 2024; 14:37470-37482. [PMID: 39582938 PMCID: PMC11583891 DOI: 10.1039/d4ra06695b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024] Open
Abstract
Technological advancements in liquid chromatography (LC) electrospray ionization (ESI) high-resolution mass spectrometry (HRMS) have made it an increasingly popular analytical technique in non-targeted analysis (NTA) of environmental and biological samples. One critical limitation of current methods in NTA is the lack of available analytical standards for many of the compounds detected in biological and environmental samples. Computational approaches can provide estimates of concentrations by modeling the relative response factor of a compound (RRF) expressed as the peak area of a given peak divided by its concentration. In this paper, we explore the application of molecular dynamics (MD) in the development of a computational workflow for predicting RRF. We obtained measurements of RRF for 48 compounds with LC - quadrupole time-of-flight (QTOF) MS and calculated their RRF. We used the CGenFF force field to generate the topologies and GROMACS to conduct the (MD) simulations. We calculated the Lennard-Jones and Coulomb interactions between the analytes and all other molecules in the ESI droplet, which were then sampled to construct a multilinear regression model for predicting RRF using Monte Carlo simulations. The best performing model showed a coefficient of determination (R 2) of 0.82 and a mean absolute error (MAE) of 0.13 log units. This performance is comparable to other predictive models including machine learning models. While there is a need for further evaluation of diverse chemical structures, our approach showed promise in predictions of RRF.
Collapse
Affiliation(s)
- Dimitri Abrahamsson
- Department of Pediatrics, New York University Grossman School of Medicine New York 10016 USA
- Department of Obstetrics, Gynecology and Reproductive Sciences, School of Medicine, University of California San Francisco California 94158 USA
| | - Lelouda-Athanasia Koronaiou
- Laboratory of Environmental Pollution Control, Department of Chemistry, Aristotle University of Thessaloniki University Campus 54124 Thessaloniki Greece
- Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center Thessaloniki 57001 Greece
| | - Trevor Johnson
- Department of Pediatrics, New York University Grossman School of Medicine New York 10016 USA
| | - Junjie Yang
- Department of Obstetrics, Gynecology and Reproductive Sciences, School of Medicine, University of California San Francisco California 94158 USA
| | - Xiaowen Ji
- Department of Pediatrics, New York University Grossman School of Medicine New York 10016 USA
| | - Dimitra A Lambropoulou
- Laboratory of Environmental Pollution Control, Department of Chemistry, Aristotle University of Thessaloniki University Campus 54124 Thessaloniki Greece
- Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center Thessaloniki 57001 Greece
| |
Collapse
|
4
|
Singh S, Kaur N, Gehlot A. Application of artificial intelligence in drug design: A review. Comput Biol Med 2024; 179:108810. [PMID: 38991316 DOI: 10.1016/j.compbiomed.2024.108810] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/31/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]
Abstract
Artificial intelligence (AI) is a field of computer science that involves acquiring information, developing rule bases, and mimicking human behaviour. The fundamental concept behind AI is to create intelligent computer systems that can operate with minimal human intervention or without any intervention at all. These rule-based systems are developed using various machine learning and deep learning models, enabling them to solve complex problems. AI is integrated with these models to learn, understand, and analyse provided data. The rapid advancement of Artificial Intelligence (AI) is reshaping numerous industries, with the pharmaceutical sector experiencing a notable transformation. AI is increasingly being employed to automate, optimize, and personalize various facets of the pharmaceutical industry, particularly in pharmacological research. Traditional drug development methods areknown for being time-consuming, expensive, and less efficient, often taking around a decade and costing billions of dollars. The integration of artificial intelligence (AI) techniques addresses these challenges by enabling the examination of compounds with desired properties from a vast pool of input drugs. Furthermore, it plays a crucial role in drug screening by predicting toxicity, bioactivity, ADME properties (absorption, distribution, metabolism, and excretion), physicochemical properties, and more. AI enhances the drug design process by improving the efficiency and accuracy of predicting drug behaviour, interactions, and properties. These approaches further significantly improve the precision of drug discovery processes and decrease clinical trial costs leading to the development of more effective drugs.
Collapse
Affiliation(s)
- Simrandeep Singh
- Department of Electronics & Communication Engineering, UCRD, Chandigarh University, Gharuan, Punjab, India.
| | - Navjot Kaur
- Department of Pharmacognosy, Amar Shaheed Baba Ajit Singh Jujhar Singh Memorial College of Pharmacy, Bela, Ropar, India
| | - Anita Gehlot
- Uttaranchal Institute of technology, Uttaranchal University, Dehradun, India
| |
Collapse
|
5
|
Bland GD, Abrahamsson D, Wang M, Zlatnik MG, Morello-Frosch R, Park JS, Sirota M, Woodruff TJ. Exploring applications of non-targeted analysis in the characterization of the prenatal exposome. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:169458. [PMID: 38142008 PMCID: PMC10947484 DOI: 10.1016/j.scitotenv.2023.169458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/15/2023] [Accepted: 12/15/2023] [Indexed: 12/25/2023]
Abstract
Capturing the breadth of chemical exposures in utero is critical in understanding their long-term health effects for mother and child. We explored methodological adaptations in a Non-Targeted Analysis (NTA) pipeline and evaluated the effects on chemical annotation and discovery for maternal and infant exposure. We focus on lesser-known/underreported chemicals in maternal and umbilical cord serum analyzed with liquid chromatography-quadrupole time-of-flight mass spectrometry (LC-QTOF/MS). The samples were collected from a demographically diverse cohort of 296 maternal-cord pairs (n = 592) recruited in San Francisco Bay area. We developed and evaluated two data processing pipelines, primarily differing by detection frequency cut-off, to extract chemical features from non-targeted analysis (NTA). We annotated the detected chemical features by matching with EPA CompTox Chemicals Dashboard (n = 860,000 chemicals) and Human Metabolome Database (n = 3140 chemicals) and applied a Kendrick Mass Defect filter to detect homologous series. We collected fragmentation spectra (MS/MS) on a subset of serum samples and matched to an experimental MS/MS database within the MS-Dial website and other experimental MS/MS spectra collected from standards in our lab. We annotated ~72 % of the features (total features = 32,197, levels 1-4). We confirmed 22 compounds with analytical standards, tentatively identified 88 compounds with MS/MS spectra, and annotated 4862 exogenous chemicals with an in-house developed annotation algorithm. We detected 36 chemicals that appear to not have been previously reported in human blood and 9 chemicals that were reported in less than five studies. Our findings underline the importance of NTA in the discovery of lesser-known/unreported chemicals important to characterize human exposures.
Collapse
Affiliation(s)
- Garret D Bland
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California San Francisco, San Francisco, CA, United States
| | - Dimitri Abrahamsson
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California San Francisco, San Francisco, CA, United States.
| | - Miaomiao Wang
- Department of Toxic Substances Control, California Environmental Protection Agency, Berkeley, CA, United States
| | - Marya G Zlatnik
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California San Francisco, San Francisco, CA, United States
| | - Rachel Morello-Frosch
- Department of Environmental Science, Policy and Management, School of Public Health, University of California Berkeley, Berkeley, CA, United States
| | - June-Soo Park
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California San Francisco, San Francisco, CA, United States; Department of Toxic Substances Control, California Environmental Protection Agency, Berkeley, CA, United States
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, Department of Pediatrics, University of California San Francisco, San Francisco 94158, CA, United States
| | - Tracey J Woodruff
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California San Francisco, San Francisco, CA, United States.
| |
Collapse
|
6
|
Johnson TA, Abrahamsson DP. Quantification of chemicals in non-targeted analysis without analytical standards - Understanding the mechanism of electrospray ionization and making predictions. CURRENT OPINION IN ENVIRONMENTAL SCIENCE & HEALTH 2024; 37:100529. [PMID: 38312491 PMCID: PMC10836048 DOI: 10.1016/j.coesh.2023.100529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
The constant creation and release of new chemicals to the environment is forming an ever-widening gap between available analytical standards and known chemicals. Developing non-targeted analysis (NTA) methods that have the ability to detect a broad spectrum of compounds is critical for research and analysis of emerging contaminants. There is a need to develop methods that make it possible to identify compound structures from their MS and MS/MS information and quantify them without analytical standards. Method refinements that utilize machine learning algorithms and chemical descriptors to estimate the instrument response of particular compounds have made progress in recent years. This narrative review seeks to summarize the current state of the field of non-targeted analysis (NTA) toward quantification of unknowns without the use of analytical standards. Despite the limited accumulation of validation studies on real samples, the ongoing enhancement in data processing and refinement of machine learning tools could lead to more comprehensive chemical coverage of NTA and validated quantitative NTA methods, thus boosting confidence in their usage and enhancing the utility of quantitative NTA.
Collapse
Affiliation(s)
- Trevor A Johnson
- Division of Environmental Pediatrics, Department of Pediatrics, Grossman School of Medicine, New York University
| | - Dimitri P Abrahamsson
- Division of Environmental Pediatrics, Department of Pediatrics, Grossman School of Medicine, New York University
| |
Collapse
|
7
|
Abrahamsson D, Brueck CL, Prasse C, Lambropoulou DA, Koronaiou LA, Wang M, Park JS, Woodruff TJ. Extracting Structural Information from Physicochemical Property Measurements Using Machine Learning─A New Approach for Structure Elucidation in Non-targeted Analysis. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:14827-14838. [PMID: 37746919 PMCID: PMC10569036 DOI: 10.1021/acs.est.3c03003] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 08/30/2023] [Indexed: 09/26/2023]
Abstract
Non-targeted analysis (NTA) has made critical contributions in the fields of environmental chemistry and environmental health. One critical bottleneck is the lack of available analytical standards for most chemicals in the environment. Our study aims to explore a novel approach that integrates measurements of equilibrium partition ratios between organic solvents and water (KSW) to predictions of molecular structures. These properties can be used as a fingerprint, which with the help of a machine learning algorithm can be converted into a series of functional groups (RDKit fragments), which can be used to search chemical databases. We conducted partitioning experiments using a chemical mixture containing 185 chemicals in 10 different organic solvents and water. Both a liquid chromatography quadrupole time-of-flight mass spectrometer (LC-QTOF MS) and a LC-Orbitrap MS were used to assess the feasibility of the experimental method and the accuracy of the algorithm at predicting the correct functional groups. The two methods showed differences in log KSW with the QTOF method showing a mean absolute error (MAE) of 0.22 and the Orbitrap method 0.33. The differences also culminated into errors in the predictions of RDKit fragments with the MAE for the QTOF method being 0.23 and for the Orbitrap method being 0.31. Our approach presents a new angle in structure elucidation for NTA and showed promise in assisting with compound identification.
Collapse
Affiliation(s)
- Dimitri Abrahamsson
- Department
of Pediatrics, New York University Grossman
School of Medicine, New York, New York 10016, United States
- Department
of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive
Health and the Environment, University of
California, San Francisco, California 94107, United States
| | - Christopher L. Brueck
- Department
of Environmental Health and Engineering, Johns Hopkins University, Baltimore, Maryland 21205, United States
- Exponent, Environmental and Earth Sciences Practice, Bellevue, Washington 98007, United States
| | - Carsten Prasse
- Department
of Environmental Health and Engineering, Johns Hopkins University, Baltimore, Maryland 21205, United States
- Risk
Sciences
and Public Policy Institute, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, United States
| | - Dimitra A. Lambropoulou
- Department
of Chemistry, Aristotle University of Thessaloniki, University Campus, 54124 Thessaloniki Greece
- Laboratory
of Environmental Pollution Control, Department of Chemistry, Aristotle University of Thessaloniki, GR-541 24 Thessaloniki, Greece
- Center for
Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, Thessaloniki, GR-57001, Greece
| | - Lelouda-Athanasia Koronaiou
- Department
of Chemistry, Aristotle University of Thessaloniki, University Campus, 54124 Thessaloniki Greece
- Laboratory
of Environmental Pollution Control, Department of Chemistry, Aristotle University of Thessaloniki, GR-541 24 Thessaloniki, Greece
- Center for
Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, Thessaloniki, GR-57001, Greece
| | - Miaomiao Wang
- Department
of Toxic Substances Control, Environmental Chemistry Laboratory, California Environmental Agency, Berkeley, California 94710, United States
| | - June-Soo Park
- Department
of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive
Health and the Environment, University of
California, San Francisco, California 94107, United States
- Department
of Toxic Substances Control, Environmental Chemistry Laboratory, California Environmental Agency, Berkeley, California 94710, United States
| | - Tracey J. Woodruff
- Department
of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive
Health and the Environment, University of
California, San Francisco, California 94107, United States
| |
Collapse
|