1
|
Harrill JA, Everett LJ, Haggard DE, Word LJ, Bundy JL, Chambers B, Harris F, Willis C, Thomas RS, Shah I, Judson R. Signature analysis of high-throughput transcriptomics screening data for mechanistic inference and chemical grouping. Toxicol Sci 2024; 202:103-122. [PMID: 39177380 DOI: 10.1093/toxsci/kfae108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2024] Open
Abstract
High-throughput transcriptomics (HTTr) uses gene expression profiling to characterize the biological activity of chemicals in in vitro cell-based test systems. As an extension of a previous study testing 44 chemicals, HTTr was used to screen an additional 1,751 unique chemicals from the EPA's ToxCast collection in MCF7 cells using 8 concentrations and an exposure duration of 6 h. We hypothesized that concentration-response modeling of signature scores could be used to identify putative molecular targets and cluster chemicals with similar bioactivity. Clustering and enrichment analyses were conducted based on signature catalog annotations and ToxPrint chemotypes to facilitate molecular target prediction and grouping of chemicals with similar bioactivity profiles. Enrichment analysis based on signature catalog annotation identified known mechanisms of action (MeOAs) associated with well-studied chemicals and generated putative MeOAs for other active chemicals. Chemicals with predicted MeOAs included those targeting estrogen receptor (ER), glucocorticoid receptor (GR), retinoic acid receptor (RAR), the NRF2/KEAP/ARE pathway, AP-1 activation, and others. Using reference chemicals for ER modulation, the study demonstrated that HTTr in MCF7 cells was able to stratify chemicals in terms of agonist potency, distinguish ER agonists from antagonists, and cluster chemicals with similar activities as predicted by the ToxCast ER Pathway model. Uniform manifold approximation and projection (UMAP) embedding of signature-level results identified novel ER modulators with no ToxCast ER Pathway model predictions. Finally, UMAP combined with ToxPrint chemotype enrichment was used to explore the biological activity of structurally related chemicals. The study demonstrates that HTTr can be used to inform chemical risk assessment by determining in vitro points of departure, predicting chemicals' MeOA and grouping chemicals with similar bioactivity profiles.
Collapse
Affiliation(s)
- Joshua A Harrill
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Logan J Everett
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Derik E Haggard
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Laura J Word
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Joseph L Bundy
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Bryant Chambers
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Felix Harris
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
- Oak Ridge Associated Universities (ORAU) National Student Services Contractor, Oak Ridge, TN 37831, United States
| | - Clinton Willis
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Russell S Thomas
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Imran Shah
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| | - Richard Judson
- Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC 27711, United States
| |
Collapse
|
2
|
Chambers BA, Basili D, Word L, Baker N, Middleton A, Judson RS, Shah I. Searching for LINCS to Stress: Using Text Mining to Automate Reference Chemical Curation. Chem Res Toxicol 2024; 37:878-893. [PMID: 38736322 PMCID: PMC11447707 DOI: 10.1021/acs.chemrestox.3c00335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
Adaptive stress response pathways (SRPs) restore cellular homeostasis following perturbation but may activate terminal outcomes like apoptosis, autophagy, or cellular senescence if disruption exceeds critical thresholds. Because SRPs hold the key to vital cellular tipping points, they are targeted for therapeutic interventions and assessed as biomarkers of toxicity. Hence, we are developing a public database of chemicals that perturb SRPs to enable new data-driven tools to improve public health. Here, we report on the automated text-mining pipeline we used to build and curate the first version of this database. We started with 100 reference SRP chemicals gathered from published biomarker studies to bootstrap the database. Second, we used information retrieval to find co-occurrences of reference chemicals with SRP terms in PubMed abstracts and determined pairwise mutual information thresholds to filter biologically relevant relationships. Third, we applied these thresholds to find 1206 putative SRP perturbagens within thousands of substances in the Library of Integrated Network-Based Cellular Signatures (LINCS). To assign SRP activity to LINCS chemicals, domain experts had to manually review at least three publications for each of 1206 chemicals out of 181,805 total abstracts. To accomplish this efficiently, we implemented a machine learning approach to predict SRP classifications from texts to prioritize abstracts. In 5-fold cross-validation testing with a corpus derived from the 100 reference chemicals, artificial neural networks performed the best (F1-macro = 0.678) and prioritized 2479/181,805 abstracts for expert review, which resulted in 457 chemicals annotated with SRP activities. An independent analysis of enriched mechanisms of action and chemical use class supported the text-mined chemical associations (p < 0.05): heat shock inducers were linked with HSP90 and DNA damage inducers to topoisomerase inhibition. This database will enable novel applications of LINCS data to evaluate SRP activities and to further develop tools for biomedical information extraction from the literature.
Collapse
Affiliation(s)
- Bryant A. Chambers
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Danilo Basili
- Unilever, Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K
| | - Laura Word
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | | | - Alistair Middleton
- Unilever, Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K
| | - Richard S. Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| |
Collapse
|
3
|
Harrill JA, Everett LJ, Haggard DE, Bundy JL, Willis CM, Shah I, Friedman KP, Basili D, Middleton A, Judson RS. Exploring the effects of experimental parameters and data modeling approaches on in vitro transcriptomic point-of-departure estimates. Toxicology 2024; 501:153694. [PMID: 38043774 PMCID: PMC11917498 DOI: 10.1016/j.tox.2023.153694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/24/2023] [Accepted: 11/29/2023] [Indexed: 12/05/2023]
Abstract
Multiple new approach methods (NAMs) are being developed to rapidly screen large numbers of chemicals to aid in hazard evaluation and risk assessments. High-throughput transcriptomics (HTTr) in human cell lines has been proposed as a first-tier screening approach for determining the types of bioactivity a chemical can cause (activation of specific targets vs. generalized cell stress) and for calculating transcriptional points of departure (tPODs) based on changes in gene expression. In the present study, we examine a range of computational methods to calculate tPODs from HTTr data, using six data sets in which MCF7 cells cultured in two different media formulations were treated with a panel of 44 chemicals for 3 different exposure durations (6, 12, 24 hr). The tPOD calculation methods use data at the level of individual genes and gene set signatures, and compare data processed using the ToxCast Pipeline 2 (tcplfit2), BMDExpress and PLIER (Pathway Level Information ExtractoR). Methods were evaluated by comparing to in vitro PODs from a validated set of high-throughput screening (HTS) assays for a set of estrogenic compounds. Key findings include: (1) for a given chemical and set of experimental conditions, tPODs calculated by different methods can vary by several orders of magnitude; (2) tPODs are at least as sensitive to computational methods as to experimental conditions; (3) in comparison to an external reference set of PODs, some methods give generally higher values, principally PLIER and BMDExpress; and (4) the tPODs from HTTr in this one cell type are mostly higher than the overall PODs from a broad battery of targeted in vitro ToxCast assays, reflecting the need to test chemicals in multiple cell types and readout technologies for in vitro hazard screening.
Collapse
Affiliation(s)
- Joshua A Harrill
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Logan J Everett
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Derik E Haggard
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA; Oak Ridge Institute for Science and Education (ORISE), USA
| | - Joseph L Bundy
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Clinton M Willis
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA; Oak Ridge Associated Universities (ORAU), USA
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Katie Paul Friedman
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Danilo Basili
- Unilever Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, UK
| | - Alistair Middleton
- Unilever Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, UK
| | - Richard S Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA.
| |
Collapse
|
4
|
Shah I, Bundy J, Chambers B, Everett LJ, Haggard D, Harrill J, Judson RS, Nyffeler J, Patlewicz G. Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities. Chem Res Toxicol 2022; 35:1929-1949. [PMID: 36301716 PMCID: PMC10483698 DOI: 10.1021/acs.chemrestox.2c00245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Screening new compounds for potential bioactivities against cellular targets is vital for drug discovery and chemical safety. Transcriptomics offers an efficient approach for assessing global gene expression changes, but interpreting chemical mechanisms from these data is often challenging. Connectivity mapping is a potential data-driven avenue for linking chemicals to mechanisms based on the observation that many biological processes are associated with unique gene expression signatures (gene signatures). However, mining the effects of a chemical on gene signatures for biological mechanisms is challenging because transcriptomic data contain thousands of noisy genes. New connectivity mapping approaches seeking to distinguish signal from noise continue to be developed, spurred by the promise of discovering chemical mechanisms, new drugs, and disease targets from burgeoning transcriptomic data. Here, we analyze these approaches in terms of diverse transcriptomic technologies, public databases, gene signatures, pattern-matching algorithms, and statistical evaluation criteria. To navigate the complexity of connectivity mapping, we propose a harmonized scheme to coherently organize and compare published workflows. We first standardize concepts underlying transcriptomic profiles and gene signatures based on various transcriptomic technologies such as microarrays, RNA-Seq, and L1000 and discuss the widely used data sources such as Gene Expression Omnibus, ArrayExpress, and MSigDB. Next, we generalize connectivity mapping as a pattern-matching task for finding similarity between a query (e.g., transcriptomic profile for new chemical) and a reference (e.g., gene signature of known target). Published pattern-matching approaches fall into two main categories: vector-based use metrics like correlation, Jaccard index, etc., and aggregation-based use parametric and nonparametric statistics (e.g., gene set enrichment analysis). The statistical methods for evaluating the performance of different approaches are described, along with comparisons reported in the literature on benchmark transcriptomic data sets. Lastly, we review connectivity mapping applications in toxicology and offer guidance on evaluating chemical-induced toxicity with concentration-response transcriptomic data. In addition to serving as a high-level guide and tutorial for understanding and implementing connectivity mapping workflows, we hope this review will stimulate new algorithms for evaluating chemical safety and drug discovery using transcriptomic data.
Collapse
Affiliation(s)
- Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Joseph Bundy
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Bryant Chambers
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Logan J. Everett
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Derik Haggard
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Joshua Harrill
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Richard S. Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Johanna Nyffeler
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
- Oak Ridge Institute for Science and Education (ORISE) Postdoctoral Fellow, Oak Ridge, Tennessee, 37831, US
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| |
Collapse
|
5
|
Basili D, Reynolds J, Houghton J, Malcomber S, Chambers B, Liddell M, Muller I, White A, Shah I, Everett LJ, Middleton A, Bender A. Latent Variables Capture Pathway-Level Points of Departure in High-Throughput Toxicogenomic Data. Chem Res Toxicol 2022; 35:670-683. [PMID: 35333521 PMCID: PMC9019810 DOI: 10.1021/acs.chemrestox.1c00444] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Indexed: 11/28/2022]
Abstract
Estimation of points of departure (PoDs) from high-throughput transcriptomic data (HTTr) represents a key step in the development of next-generation risk assessment (NGRA). Current approaches mainly rely on single key gene targets, which are constrained by the information currently available in the knowledge base and make interpretation challenging as scientists need to interpret PoDs for thousands of genes or hundreds of pathways. In this work, we aimed to address these issues by developing a computational workflow to investigate the pathway concentration-response relationships in a way that is not fully constrained by known biology and also facilitates interpretation. We employed the Pathway-Level Information ExtractoR (PLIER) to identify latent variables (LVs) describing biological activity and then investigated in vitro LVs' concentration-response relationships using the ToxCast pipeline. We applied this methodology to a published transcriptomic concentration-response data set for 44 chemicals in MCF-7 cells and showed that our workflow can capture known biological activity and discriminate between estrogenic and antiestrogenic compounds as well as activity not aligning with the existing knowledge base, which may be relevant in a risk assessment scenario. Moreover, we were able to identify the known estrogen activity in compounds that are not well-established ER agonists/antagonists supporting the use of the workflow in read-across. Next, we transferred its application to chemical compounds tested in HepG2, HepaRG, and MCF-7 cells and showed that PoD estimates are in strong agreement with those estimated using a recently developed Bayesian approach (cor = 0.89) and in weak agreement with those estimated using a well-established approach such as BMDExpress2 (cor = 0.57). These results demonstrate the effectiveness of using PLIER in a concentration-response scenario to investigate pathway activity in a way that is not fully constrained by the knowledge base and to ease the biological interpretation and support the development of an NGRA framework with the ability to improve current risk assessment strategies for chemicals using new approach methodologies.
Collapse
Affiliation(s)
- Danilo Basili
- Department
of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K.
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Joe Reynolds
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Jade Houghton
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Sophie Malcomber
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Bryant Chambers
- Center
for Computational Toxicology and Exposure, Office of Research and
Development, U.S. Environmental Protection
Agency, Research Triangle Park, North Carolina 27711, United States
| | - Mark Liddell
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Iris Muller
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Andrew White
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Imran Shah
- Center
for Computational Toxicology and Exposure, Office of Research and
Development, U.S. Environmental Protection
Agency, Research Triangle Park, North Carolina 27711, United States
| | - Logan J. Everett
- Center
for Computational Toxicology and Exposure, Office of Research and
Development, U.S. Environmental Protection
Agency, Research Triangle Park, North Carolina 27711, United States
| | - Alistair Middleton
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Andreas Bender
- Department
of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K.
| |
Collapse
|