1
|
Charest N, Lowe CN, Ramsland C, Meyer B, Samano V, Williams AJ. Improving predictions of compound amenability for liquid chromatography-mass spectrometry to enhance non-targeted analysis. Anal Bioanal Chem 2024:10.1007/s00216-024-05229-5. [PMID: 38530399 DOI: 10.1007/s00216-024-05229-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/14/2024] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography-mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts.
Collapse
Affiliation(s)
- Nathaniel Charest
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA.
| | - Charles N Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | | | - Brian Meyer
- Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | - Vicente Samano
- Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| |
Collapse
|
2
|
Buckley TJ, Egeghy PP, Isaacs K, Richard AM, Ring C, Sayre RR, Sobus JR, Thomas RS, Ulrich EM, Wambaugh JF, Williams AJ. Cutting-edge computational chemical exposure research at the U.S. Environmental Protection Agency. ENVIRONMENT INTERNATIONAL 2023; 178:108097. [PMID: 37478680 PMCID: PMC10588682 DOI: 10.1016/j.envint.2023.108097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/05/2023] [Accepted: 07/12/2023] [Indexed: 07/23/2023]
Abstract
Exposure science is evolving from its traditional "after the fact" and "one chemical at a time" approach to forecasting chemical exposures rapidly enough to keep pace with the constantly expanding landscape of chemicals and exposures. In this article, we provide an overview of the approaches, accomplishments, and plans for advancing computational exposure science within the U.S. Environmental Protection Agency's Office of Research and Development (EPA/ORD). First, to characterize the universe of chemicals in commerce and the environment, a carefully curated, web-accessible chemical resource has been created. This DSSTox database unambiguously identifies >1.2 million unique substances reflecting potential environmental and human exposures and includes computationally accessible links to each compound's corresponding data resources. Next, EPA is developing, applying, and evaluating predictive exposure models. These models increasingly rely on data, computational tools like quantitative structure activity relationship (QSAR) models, and machine learning/artificial intelligence to provide timely and efficient prediction of chemical exposure (and associated uncertainty) for thousands of chemicals at a time. Integral to this modeling effort, EPA is developing data resources across the exposure continuum that includes application of high-resolution mass spectrometry (HRMS) non-targeted analysis (NTA) methods providing measurement capability at scale with the number of chemicals in commerce. These research efforts are integrated and well-tailored to support population exposure assessment to prioritize chemicals for exposure as a critical input to risk management. In addition, the exposure forecasts will allow a wide variety of stakeholders to explore sustainable initiatives like green chemistry to achieve economic, social, and environmental prosperity and protection of future generations.
Collapse
Affiliation(s)
- Timothy J Buckley
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States.
| | - Peter P Egeghy
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Kristin Isaacs
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Ann M Richard
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Caroline Ring
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Risa R Sayre
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Jon R Sobus
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Russell S Thomas
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Elin M Ulrich
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - John F Wambaugh
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Antony J Williams
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| |
Collapse
|
3
|
Sinclair G, Thillainadarajah I, Meyer B, Samano V, Sivasupramaniam S, Adams L, Willighagen EL, Richard AM, Walker M, Williams AJ. Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data. J Chem Inf Model 2022; 62:4888-4905. [PMID: 36215146 PMCID: PMC9597659 DOI: 10.1021/acs.jcim.2c00886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The online encyclopedia
Wikipedia aggregates a large amount of
data on chemistry, encompassing well over 20,000 individual Wikipedia
pages and serves the general public as well as the chemistry community.
Many other chemical databases and services utilize these data, and
previous projects have focused on methods to index, search, and extract
it for review and use. We present a comprehensive effort that combines
bulk automated data extraction over tens of thousands of pages, semiautomated
data extraction over hundreds of pages, and fine-grained manual extraction
of individual lists and compounds of interest. We then correlate these
data with the existing contents of the U.S. Environmental Protection
Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox)
database. This was performed with a number of intentions including
ensuring as complete a mapping as possible between the Dashboard and
Wikipedia so that relevant snippets of the article are loaded for
the user to review. Conflicts between Dashboard content and Wikipedia
in terms of, for example, identifiers such as chemical registry numbers,
names, and InChIs and structure-based collisions such as SMILES were
identified and used as the basis of curation of both DSSTox and Wikipedia.
This work also allowed us to evaluate available data for sets of chemicals
of interest to the Agency, such as synthetic cannabinoids, and expand
the content in DSSTox as appropriate. This work also led to improved
bidirectional linkage of the detailed chemistry and usage information
from Wikipedia with expert-curated structure and identifier data from
DSSTox for a new list of nearly 20,000 chemicals. All of this work
ultimately enhances the data mappings that allow for the display of
the introduction of the Wikipedia article in the community-accessible
web-based EPA Comptox Chemicals Dashboard, enhancing the user experience
for the thousands of users per day accessing the resource.
Collapse
Affiliation(s)
- Gabriel Sinclair
- ORAU Student Services Contractor to Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Inthirany Thillainadarajah
- Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Brian Meyer
- Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Vicente Samano
- Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Sakuntala Sivasupramaniam
- Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Linda Adams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Egon L Willighagen
- Department of Bioinformatics─BiGCaT, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Ann M Richard
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Martin Walker
- Martin Walker, SUNY Potsdam─Chemistry, 44 Pierrepont Avenue, Potsdam, New York 13676, United States
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
4
|
Phillips AL, Williams AJ, Sobus JR, Ulrich EM, Gundersen J, Langlois-Miller C, Newton SR. A Framework for Utilizing High-Resolution Mass Spectrometry and Nontargeted Analysis in Rapid Response and Emergency Situations. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2022; 41:1117-1130. [PMID: 34416028 PMCID: PMC9280853 DOI: 10.1002/etc.5196] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 07/26/2021] [Accepted: 08/17/2021] [Indexed: 05/03/2023]
Abstract
Unknown chemical releases constitute a large portion of the rapid response situations to which the US Environmental Protection Agency is called on to respond. Workflows used to address unknown chemical releases currently involve screening for a large array of known compounds using many different targeted methods. When matches are not found, expert analytical chemistry knowledge is used to propose possible candidates from the available data, which generally includes low-resolution mass spectra and situational clues such as the location of the release, nearby industrial operations, and other field-reported facts. The past decade has witnessed dramatic improvements in capabilities for identifying unknown compounds using high-resolution mass spectrometry (HRMS) and nontargeted analysis (NTA) approaches. Complementary developments in cheminformatics tools have further enabled an increase in NTA throughput and identification confidence. Together with the expanding availability of HRMS instrumentation in monitoring laboratories, these advancements make NTA highly relevant to rapid response scenarios. In this article, we introduce the concept of NTA as it relates to rapid response needs and describe how it can be applied to address unknown chemical releases. We advocate for the consideration of HRMS-based NTA approaches to support future rapid response scenarios. Environ Toxicol Chem 2022;41:1117-1130. Published 2021. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Allison L. Phillips
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Public Health and Environmental Assessment, Research Triangle Park, NC 27711
| | - Antony J. Williams
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC 27711
| | - Jon R. Sobus
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC 27711
| | - Elin M. Ulrich
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC 27711
| | - Jennifer Gundersen
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Environmental Measurement and Modeling, Narragansett, RI 02882
| | - Christina Langlois-Miller
- U.S. Environmental Protection Agency, Office of Land and Emergency Management, Office of Emergency Management, Washington D.C. 20460
| | - Seth R. Newton
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC 27711
- Corresponding author contact information: Seth R. Newton, , Mail: 109 T.W. Alexander Drive E205-05, RTP, NC 27711
| |
Collapse
|
5
|
Sussman EM, Oktem B, Isayeva IS, Liu J, Wickramasekara S, Chandrasekar V, Nahan K, Shin HY, Zheng J. Chemical Characterization and Non-targeted Analysis of Medical Device Extracts: A Review of Current Approaches, Gaps, and Emerging Practices. ACS Biomater Sci Eng 2022; 8:939-963. [PMID: 35171560 DOI: 10.1021/acsbiomaterials.1c01119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The developers of medical devices evaluate the biocompatibility of their device prior to FDA's review and subsequent introduction to the market. Chemical characterization, described in ISO 10993-18:2020, can generate information for toxicological risk assessment and is an alternative approach for addressing some biocompatibility end points (e.g., systemic toxicity, genotoxicity, carcinogenicity, reproductive/developmental toxicity) that can reduce the time and cost of testing and the need for animal testing. Additionally, chemical characterization can be used to determine whether modifications to the materials and manufacturing processes alter the chemistry of a patient-contacting device to an extent that could impact device safety. Extractables testing is one approach to chemical characterization that employs combinations of non-targeted analysis, non-targeted screening, and/or targeted analysis to establish the identities and quantities of the various chemical constituents that can be released from a device. Due to the difficulty in obtaining a priori information on all the constituents in finished devices, information generation strategies in the form of analytical chemistry testing are often used. Identified and quantified extractables are then assessed using toxicological risk assessment approaches to determine if reported quantities are sufficiently low to overcome the need for further chemical analysis, biological evaluation of select end points, or risk control. For extractables studies to be useful as a screening tool, comprehensive and reliable non-targeted methods are needed. Although non-targeted methods have been adopted by many laboratories, they are laboratory-specific and require expensive analytical instruments and advanced technical expertise to perform. In this Perspective, we describe the elements of extractables studies and provide an overview of the current practices, identified gaps, and emerging practices that may be adopted on a wider scale in the future. This Perspective is outlined according to the steps of an extractables study: information gathering, extraction, extract sample processing, system selection, qualification, quantification, and identification.
Collapse
Affiliation(s)
- Eric M Sussman
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Berk Oktem
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Irada S Isayeva
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jinrong Liu
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Samanthi Wickramasekara
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Vaishnavi Chandrasekar
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Keaton Nahan
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Hainsworth Y Shin
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jiwen Zheng
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| |
Collapse
|
6
|
Shrivastava AD, Swainston N, Samanta S, Roberts I, Wright Muelas M, Kell DB. MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra. Biomolecules 2021; 11:1793. [PMID: 34944436 PMCID: PMC8699281 DOI: 10.3390/biom11121793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/14/2021] [Accepted: 11/27/2021] [Indexed: 12/15/2022] Open
Abstract
The 'inverse problem' of mass spectrometric molecular identification ('given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came') is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem ('calculate a small molecule's likely fragmentation and hence at least some of its mass spectrum from its structure alone') is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the 'translation' a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the 'true' molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are 'similar' to the top hit. In addition to using the 'top hits' directly, we can produce a rank order of these by 'round-tripping' candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to 'learn' millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Ivayla Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Marina Wright Muelas
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
7
|
Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis. Anal Bioanal Chem 2021; 413:7495-7508. [PMID: 34648052 DOI: 10.1007/s00216-021-03713-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/22/2021] [Accepted: 10/01/2021] [Indexed: 10/20/2022]
Abstract
With the increasing availability of high-resolution mass spectrometers, suspect screening and non-targeted analysis are becoming popular compound identification tools for environmental researchers. Samples of interest often contain a large (unknown) number of chemicals spanning the detectable mass range of the instrument. In an effort to separate these chemicals prior to injection into the mass spectrometer, a chromatography method is often utilized. There are numerous types of gas and liquid chromatographs that can be coupled to commercially available mass spectrometers. Depending on the type of instrument used for analysis, the researcher is likely to observe a different subset of compounds based on the amenability of those chemicals to the selected experimental techniques and equipment. It would be advantageous if this subset of chemicals could be predicted prior to conducting the experiment, in order to minimize potential false-positive and false-negative identifications. In this work, we utilize experimental datasets to predict the amenability of chemical compounds to detection with liquid chromatography-electrospray ionization-mass spectrometry (LC-ESI-MS). The assembled dataset totals 5517 unique chemicals either explicitly detected or not detected with LC-ESI-MS. The resulting detected/not-detected matrix has been modeled using specific molecular descriptors to predict which chemicals are amenable to LC-ESI-MS, and to which form(s) of ionization. Random forest models, including a measure of the applicability domain of the model for both positive and negative modes of the electrospray ionization source, were successfully developed. The outcome of this work will help to inform future suspect screening and non-targeted analyses of chemicals by better defining the potential LC-ESI-MS detectable chemical landscape of interest.
Collapse
|
8
|
Williams AJ, Lambert JC, Thayer K, Dorne JLCM. Sourcing data on chemical properties and hazard data from the US-EPA CompTox Chemicals Dashboard: A practical guide for human risk assessment. ENVIRONMENT INTERNATIONAL 2021; 154:106566. [PMID: 33934018 PMCID: PMC9667884 DOI: 10.1016/j.envint.2021.106566] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 04/04/2021] [Accepted: 04/05/2021] [Indexed: 05/19/2023]
Abstract
For the past six decades, human health risk assessment of chemicals has relied on in vivo data from human epidemiological and experimental animal toxicological studies to inform the derivation of non-cancer toxicity values. The ongoing evolution of this risk assessment paradigm in an environmental landscape of data-poor chemicals has highlighted the need to develop and implement non-testing methods, so-called New Approach Methodologies (NAMs). NAMs include a growing number of in silico and in vitro data streams designed to inform hazard properties of chemicals, including kinetics and dynamics at different levels of biological organization, environmental fate and transport, and exposure. NAMs provide a fit-for-purpose science-basis for human hazard and risk characterization of chemicals ranging from data-gap filling applications to broad evidence-based decision-making. Systematic assembly and delivery of empirical and predicted data for chemicals are paramount to advancing chemical evaluation, and software tools serve an essential role in delivering these data to the scientific community. The CompTox Chemicals Dashboard (from here on referred to as the "Dashboard") is one such tool and is a publicly available web-based application developed by the US Environmental Protection Agency to provide access to chemistry, toxicity and exposure information for ~900,000 chemicals. The Dashboard is increasingly becoming a valuable resource for assessors tasked with the evaluation of potential human health risks associated with chemical exposures. In this context, the significant amount of information present in the Dashboard facilitates: 1) assembly of information on physicochemical properties and environmental fate and transport and exposure parameters and metrics; 2) identification of cancer and non-cancer health effects from extant human and experimental animal studies in the public domain and/or information not available in the public domain (i.e., "grey literature"); 3) systematic literature searching and review for developing cancer and non-cancer hazard evidence bases; and 4) access to mechanistic information that can aid or augment the analysis of traditional toxicology evidence bases, or potentially, serve as the primary basis for informing hazard identification and dose-response when traditional bioassay data are lacking. Finally, in silico predictive tools developed to conduct structure-activity or read-across analyses are also available within the Dashboard. This practical tutorial is intended to address key questions from the human health risk assessment community dealing with chemicals in both food and in the environment. Perspectives for future development or refinement of the Dashboard highlight foreseen activities to further support the research and risk assessment community in cancer and non-cancer chemical evaluations.
Collapse
Affiliation(s)
- Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, NC, USA.
| | - Jason C Lambert
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, NC, USA
| | - Kris Thayer
- Center for Public Health and Environmental Assessment, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, NC, USA
| | - Jean-Lou C M Dorne
- Scientific Committee and Emerging Risks Unit, Department of Risk Assessment and Scientific Assistance, European Food Safety Authority, 43126 Parma, Italy
| |
Collapse
|
9
|
González-Gaya B, Lopez-Herguedas N, Bilbao D, Mijangos L, Iker AM, Etxebarria N, Irazola M, Prieto A, Olivares M, Zuloaga O. Suspect and non-target screening: the last frontier in environmental analysis. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2021; 13:1876-1904. [PMID: 33913946 DOI: 10.1039/d1ay00111f] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Suspect and non-target screening (SNTS) techniques are arising as new analytical strategies useful to disentangle the environmental occurrence of the thousands of exogenous chemicals present in our ecosystems. The unbiased discovery of the wide number of substances present over environmental analysis needs to find a consensus with powerful technical and computational requirements, as well as with the time-consuming unequivocal identification of discovered analytes. Within these boundaries, the potential applications of SNTS include the studies of environmental pollution in aquatic, atmospheric, solid and biological samples, the assessment of new compounds, transformation products and metabolites, contaminant prioritization, bioremediation or soil/water treatment evaluation, and retrospective data analysis, among many others. In this review, we evaluate the state of the art of SNTS techniques going over the normalized workflow from sampling and sample treatment to instrumental analysis, data processing and a brief review of the more recent applications of SNTS in environmental occurrence and exposure to xenobiotics. The main issues related to harmonization and knowledge gaps are critically evaluated and the challenges of their implementation are assessed in order to ensure a proper use of these promising techniques in the near future.
Collapse
Affiliation(s)
- B González-Gaya
- Department of Analytical Chemistry, University of the Basque Country (UPV/EHU), 48940 Leioa, Basque Country, Spain.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Krettler CA, Thallinger GG. A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics. Brief Bioinform 2021; 22:6184408. [PMID: 33758925 DOI: 10.1093/bib/bbab073] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/29/2021] [Accepted: 02/12/2021] [Indexed: 12/27/2022] Open
Abstract
Metabolomics, the comprehensive study of the metabolome, and lipidomics-the large-scale study of pathways and networks of cellular lipids-are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods-including quantum chemistry and machine learning-and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them-especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.
Collapse
Affiliation(s)
- Christoph A Krettler
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| | - Gerhard G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| |
Collapse
|
11
|
Pleil JD, Lowe CN, Wallace MAG, Williams AJ. Using the US EPA CompTox Chemicals Dashboard to interpret targeted and non-targeted GC-MS analyses from human breath and other biological media. J Breath Res 2021; 15:025001. [PMID: 33734097 DOI: 10.1088/1752-7163/abdb03] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The U.S. EPA CompTox Chemicals Dashboard is a freely available web-based application providing access to chemistry, toxicity, and exposure data for ∼900 000 chemicals. Data, search functionality, and prediction models within the Dashboard can help identify chemicals found in environmental analyses and human biomonitoring. It was designed to deliver data generated to support computational toxicology to reduce chemical testing on animals and provide access to new approach methodologies including prediction models. The inclusion of mass and formula-based searches, together with relevant ranking approaches, allows for the identification and prioritization of exogenous (environmental) chemicals from high resolution mass spectrometry in need of further evaluation. The Dashboard includes chemicals that can be detected by liquid chromatography, gas chromatography-mass spectrometry (GC-MS) and direct-MS analyses, and chemical lists have been added that highlight breath-borne volatile and semi-volatile organic compounds. The Dashboard can be searched using various chemical identifiers (e.g. chemical synonyms, CASRN and InChIKeys), chemical formula, MS-ready formulae monoisotopic mass, consumer product categories and assays/genes associated with high-throughput screening data. An integrated search at a chemical level performs searches against PubMed to identify relevant published literature. This article describes specific procedures using the Dashboard as a first-stop tool for exploring both targeted and non-targeted results from GC-MS analyses of chemicals found in breath, exhaled breath condensate, and associated aerosols.
Collapse
Affiliation(s)
- Joachim D Pleil
- Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, United States of America
| | | | | | | |
Collapse
|