1
|
Kugic A, Pfeifer B, Schulz S, Kreuzthaler M. Embedding-based terminology expansion via secondary use of large clinical real-world datasets. J Biomed Inform 2023; 147:104497. [PMID: 37777164 DOI: 10.1016/j.jbi.2023.104497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 06/06/2023] [Accepted: 09/08/2023] [Indexed: 10/02/2023]
Abstract
A log-likelihood based co-occurrence analysis of ∼1.9 million de-identified ICD-10 codes and related short textual problem list entries generated possible term candidates at a significance level of p<0.01. These top 10 term candidates, consisting of 1 to 5-grams, were used as seed terms for an embedding based nearest neighbor approach to fetch additional synonyms, hypernyms and hyponyms in the respective n-gram embedding spaces by leveraging two different language models. This was done to analyze the lexicality of the resulting term candidates and to compare the term classifications of both models. We found no difference in system performance during the processing of lexical and non-lexical content, i.e. abbreviations, acronyms, etc. Additionally, an application-oriented analysis of the SapBERT (Self-Alignment Pretraining for Biomedical Entity Representations) language model indicates suitable performance for the extraction of all term classifications such as synonyms, hypernyms, and hyponyms.
Collapse
Affiliation(s)
- Amila Kugic
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| | - Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| | - Markus Kreuzthaler
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria.
| |
Collapse
|
2
|
Koval LE, Dionisio KL, Friedman KP, Isaacs KK, Rager JE. Environmental mixtures and breast cancer: identifying co-exposure patterns between understudied vs breast cancer-associated chemicals using chemical inventory informatics. J Expo Sci Environ Epidemiol 2022; 32:794-807. [PMID: 35710593 DOI: 10.15139/s3/umpckw] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 05/27/2022] [Accepted: 05/31/2022] [Indexed: 05/28/2023]
Abstract
BACKGROUND Although evidence linking environmental chemicals to breast cancer is growing, mixtures-based exposure evaluations are lacking. OBJECTIVE This study aimed to identify environmental chemicals in use inventories that co-occur and share properties with chemicals that have association with breast cancer, highlighting exposure combinations that may alter disease risk. METHODS The occurrence of chemicals within chemical use categories was characterized using the Chemical and Products Database. Co-exposure patterns were evaluated for chemicals that have an association with breast cancer (BC), no known association (NBC), and understudied chemicals (UC) identified through query of the Silent Spring Institute's Mammary Carcinogens Review Database and the U.S. Environmental Protection Agency's Toxicity Reference Database. UCs were ranked based on structure and physicochemical similarities and co-occurrence patterns with BCs within environmentally relevant exposure sources. RESULTS A total of 6793 chemicals had data available for exposure source occurrence analyses. 50 top-ranking UCs spanning five clusters of co-occurring chemicals were prioritized, based on shared properties with co-occuring BCs, including chemicals used in food production and consumer/personal care products, as well as potential endocrine system modulators. SIGNIFICANCE Results highlight important co-exposure conditions that are likely prevalent within our everyday environments that warrant further evaluation for possible breast cancer risk. IMPACT STATEMENT Most environmental studies on breast cancer have focused on evaluating relationships between individual, well-known chemicals and breast cancer risk. This study set out to expand this research field by identifying understudied chemicals and mixtures that may occur in everyday environments due to their patterns of commercial use. Analyses focused on those that co-occur alongside chemicals associated with breast cancer, based upon in silico chemical database querying and analysis. Particularly in instances when understudied chemicals share physicochemical properties and structural features with carcinogens, these chemical mixtures represent conditions that should be studied in future clinical, epidemiological, and toxicological studies.
Collapse
Affiliation(s)
- Lauren E Koval
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kathie L Dionisio
- Immediate Office of the Assistant Administrator, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Katie Paul Friedman
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Kristin K Isaacs
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Curriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
3
|
Koval LE, Dionisio KL, Friedman KP, Isaacs KK, Rager JE. Environmental mixtures and breast cancer: identifying co-exposure patterns between understudied vs breast cancer-associated chemicals using chemical inventory informatics. J Expo Sci Environ Epidemiol 2022; 32:794-807. [PMID: 35710593 PMCID: PMC9742149 DOI: 10.1038/s41370-022-00451-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 05/27/2022] [Accepted: 05/31/2022] [Indexed: 05/15/2023]
Abstract
BACKGROUND Although evidence linking environmental chemicals to breast cancer is growing, mixtures-based exposure evaluations are lacking. OBJECTIVE This study aimed to identify environmental chemicals in use inventories that co-occur and share properties with chemicals that have association with breast cancer, highlighting exposure combinations that may alter disease risk. METHODS The occurrence of chemicals within chemical use categories was characterized using the Chemical and Products Database. Co-exposure patterns were evaluated for chemicals that have an association with breast cancer (BC), no known association (NBC), and understudied chemicals (UC) identified through query of the Silent Spring Institute's Mammary Carcinogens Review Database and the U.S. Environmental Protection Agency's Toxicity Reference Database. UCs were ranked based on structure and physicochemical similarities and co-occurrence patterns with BCs within environmentally relevant exposure sources. RESULTS A total of 6793 chemicals had data available for exposure source occurrence analyses. 50 top-ranking UCs spanning five clusters of co-occurring chemicals were prioritized, based on shared properties with co-occuring BCs, including chemicals used in food production and consumer/personal care products, as well as potential endocrine system modulators. SIGNIFICANCE Results highlight important co-exposure conditions that are likely prevalent within our everyday environments that warrant further evaluation for possible breast cancer risk. IMPACT STATEMENT Most environmental studies on breast cancer have focused on evaluating relationships between individual, well-known chemicals and breast cancer risk. This study set out to expand this research field by identifying understudied chemicals and mixtures that may occur in everyday environments due to their patterns of commercial use. Analyses focused on those that co-occur alongside chemicals associated with breast cancer, based upon in silico chemical database querying and analysis. Particularly in instances when understudied chemicals share physicochemical properties and structural features with carcinogens, these chemical mixtures represent conditions that should be studied in future clinical, epidemiological, and toxicological studies.
Collapse
Affiliation(s)
- Lauren E Koval
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kathie L Dionisio
- Immediate Office of the Assistant Administrator, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Katie Paul Friedman
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Kristin K Isaacs
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Curriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
4
|
Martin MM, Baker NC, Boyes WK, Carstens KE, Culbreth ME, Gilbert ME, Harrill JA, Nyffeler J, Padilla S, Friedman KP, Shafer TJ. An expert-driven literature review of "negative" chemicals for developmental neurotoxicity (DNT) in vitro assay evaluation. Neurotoxicol Teratol 2022; 93:107117. [PMID: 35908584 DOI: 10.1016/j.ntt.2022.107117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 06/27/2022] [Accepted: 07/18/2022] [Indexed: 11/26/2022]
Abstract
To date, approximately 200 chemicals have been tested in US Environmental Protection Agency (EPA) or Organization for Economic Co-operation and Development (OECD) developmental neurotoxicity (DNT) guideline studies, leaving thousands of chemicals without traditional animal information on DNT hazard potential. To address this data gap, a battery of in vitro DNT new approach methodologies (NAMs) has been proposed. Evaluation of the performance of this battery will increase the confidence in its use to determine DNT chemical hazards. One approach to evaluate DNT NAM performance is to use a set of chemicals to evaluate sensitivity and specificity. Since a list of chemicals with potential evidence of in vivo DNT has been established, this study aims to develop a curated list of "negative" chemicals for inclusion in a "DNT NAM evaluation set". A workflow, including a literature search followed by an expert-driven literature review, was used to systematically screen 39 chemicals for lack of DNT effect. Expert panel members evaluated the scientific robustness of relevant studies to inform chemical categorizations. Following review, the panel discussed each chemical and made categorical determinations of "Favorable", "Not Favorable", or "Indeterminate" reflecting acceptance, lack of suitability, or uncertainty given specific limitations and considerations, respectively. The panel determined that 10, 22, and 7 chemicals met the criteria for "Favorable", "Not Favorable", and "Indeterminate", for use as negatives in a DNT NAM evaluation set. Ultimately, this approach not only supports DNT NAM performance evaluation but also highlights challenges in identifying large numbers of negative DNT chemicals.
Collapse
Affiliation(s)
- Melissa M Martin
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Nancy C Baker
- Leidos, Research Triangle Park, Research Triangle Park, NC 27711, USA
| | - William K Boyes
- Neurological and Endocrine Toxicology Branch, Public Health and Integrated Toxicology Division, CPHEA/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Kelly E Carstens
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Megan E Culbreth
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Mary E Gilbert
- Neurological and Endocrine Toxicology Branch, Public Health and Integrated Toxicology Division, CPHEA/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Joshua A Harrill
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Johanna Nyffeler
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA; Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN, USA
| | - Stephanie Padilla
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Katie Paul Friedman
- Computational Toxicology & Bioinformatics Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Timothy J Shafer
- Rapid Assay Development Branch, Biomolecular and Computational Toxicology Division, CCTE/ORD, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA.
| |
Collapse
|
5
|
Smith DP, Oechsle O, Rawling MJ, Savory E, Lacoste AMB, Richardson PJ. Expert-Augmented Computational Drug Repurposing Identified Baricitinib as a Treatment for COVID-19. Front Pharmacol 2021; 12:709856. [PMID: 34393789 PMCID: PMC8356560 DOI: 10.3389/fphar.2021.709856] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 06/24/2021] [Indexed: 12/15/2022] Open
Abstract
The onset of the 2019 Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic necessitated the identification of approved drugs to treat the disease, before the development, approval and widespread administration of suitable vaccines. To identify such a drug, we used a visual analytics workflow where computational tools applied over an AI-enhanced biomedical knowledge graph were combined with human expertise. The workflow comprised rapid augmentation of knowledge graph information from recent literature using machine learning (ML) based extraction, with human-guided iterative queries of the graph. Using this workflow, we identified the rheumatoid arthritis drug baricitinib as both an antiviral and anti-inflammatory therapy. The effectiveness of baricitinib was substantiated by the recent publication of the data from the ACTT-2 randomised Phase 3 trial, followed by emergency approval for use by the FDA, and a report from the CoV-BARRIER trial confirming significant reductions in mortality with baricitinib compared to standard of care. Such methods that iteratively combine computational tools with human expertise hold promise for the identification of treatments for rare and neglected diseases and, beyond drug repurposing, in areas of biological research where relevant data may be lacking or hidden in the mass of available biomedical literature.
Collapse
Affiliation(s)
| | | | | | - Ed Savory
- BenevolentAI, London, United Kingdom
| | | | | |
Collapse
|
6
|
Zurlinden TJ, Saili KS, Baker NC, Toimela T, Heinonen T, Knudsen TB. A cross-platform approach to characterize and screen potential neurovascular unit toxicants. Reprod Toxicol 2020; 96:300-15. [PMID: 32590145 DOI: 10.1016/j.reprotox.2020.06.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 05/28/2020] [Accepted: 06/15/2020] [Indexed: 12/24/2022]
Abstract
Development of the neurovascular unit (NVU) is a complex, multistage process that requires orchestrated cell signaling mechanisms across several cell types and ultimately results in formation of the blood-brain barrier. Typical high-throughput screening (HTS) assays investigate single biochemical or single cell responses following chemical insult. As the NVU comprises multiple cell types interacting at various stages of development, a methodology combining high-throughput results across pertinent cell-based assays is needed to investigate potential chemical-induced disruption to the development of this complex cell system. To this end, we implemented a novel method for screening putative NVU disruptors across diverse assay platforms to predict chemical perturbation of the developing NVU. HTS assay results measuring chemical-induced perturbations to cellular key events across angiogenic and neurogenic outcomes in vitro were combined to create a cell-based prioritization of NVU hazard. Chemicals were grouped according to similar modes of action to train a logistic regression literature model on a training set of 38 chemicals. This model utilizes the chemical-specific pairwise mutual information score for PubMed MeSH annotations to represent a quantitative measure of previously published results. Taken together, this study presents a methodology to investigate NVU developmental hazard using cell-based HTS assays and literature evidence to prioritize screening of putative NVU disruptors towards a knowledge-driven characterization of neurovascular developmental toxicity. The results from these screening efforts demonstrate that chemicals representing a range of putative vascular disrupting compound (pVDC) scores can also produce effects on neurogenic outcomes and characterizes possible modes of action for disrupting the developing NVU.
Collapse
|
7
|
Watford S, Edwards S, Angrish M, Judson RS, Paul Friedman K. Progress in data interoperability to support computational toxicology and chemical safety evaluation. Toxicol Appl Pharmacol 2019; 380:114707. [PMID: 31404555 PMCID: PMC7705611 DOI: 10.1016/j.taap.2019.114707] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/29/2019] [Accepted: 08/06/2019] [Indexed: 12/20/2022]
Abstract
New approach methodologies (NAMs) in chemical safety evaluation are being explored to address the current public health implications of human environmental exposures to chemicals with limited or no data for assessment. For over a decade since a push toward "Toxicity Testing in the 21st Century," the field has focused on massive data generation efforts to inform computational approaches for preliminary hazard identification, adverse outcome pathways that link molecular initiating events and key events to apical outcomes, and high-throughput approaches to risk-based ratios of bioactivity and exposure to inform relative priority and safety assessment. Projects like the interagency Tox21 program and the US EPA ToxCast program have generated dose-response information on thousands of chemicals, identified and aggregated information from legacy systems, and created tools for access and analysis. The resulting information has been used to develop computational models as viable options for regulatory applications. This progress has introduced challenges in data management that are new, but not unique, to toxicology. Some of the key questions require critical thinking and solutions to promote semantic interoperability, including: (1) identification of bioactivity information from NAMs that might be related to a biological process; (2) identification of legacy hazard information that might be related to a key event or apical outcomes of interest; and, (3) integration of these NAM and traditional data for computational modeling and prediction of complex apical outcomes such as carcinogenesis. This work reviews a number of toxicology-related efforts specifically related to bioactivity and toxicological data interoperability based on the goals established by Findable, Accessible, Interoperable, and Reusable (FAIR) Data Principles. These efforts are essential to enable better integration of NAM and traditional toxicology information to support data-driven toxicology applications.
Collapse
Affiliation(s)
- Sean Watford
- Booz Allen Hamilton, Rockville, MD 20852, USA; National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Stephen Edwards
- Research Triangle Institute International, Research Triangle Park, NC 27709, USA
| | - Michelle Angrish
- National Center for Environmental Assessment, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Katie Paul Friedman
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA.
| |
Collapse
|
8
|
Watford S, Ly Pham L, Wignall J, Shin R, Martin MT, Friedman KP. ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses. Reprod Toxicol 2019; 89:145-158. [PMID: 31340180 PMCID: PMC6944327 DOI: 10.1016/j.reprotox.2019.07.012] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 05/31/2019] [Accepted: 07/12/2019] [Indexed: 02/08/2023]
Abstract
The Toxicity Reference Database (ToxRefDB) structures information from over 5000 in vivo toxicity studies, conducted largely to guidelines or specifications from the US Environmental Protection Agency and the National Toxicology Program, into a public resource for training and validation of predictive models. Herein, ToxRefDB version 2.0 (ToxRefDBv2) development is described. Endpoints were annotated (e.g. required, not required) according to guidelines for subacute, subchronic, chronic, developmental, and multigenerational reproductive designs, distinguishing negative responses from untested. Quantitative data were extracted, and dose-response modeling for nearly 28,000 datasets from nearly 400 endpoints using Benchmark Dose (BMD) Modeling Software were generated and stored. Implementation of controlled vocabulary improved data quality; standardization to guideline requirements and cross-referencing with United Medical Language System (UMLS) connects ToxRefDBv2 observations to vocabularies linked to UMLS, including PubMed medical subject headings. ToxRefDBv2 allows for increased connections to other resources and has greatly enhanced quantitative and qualitative utility for predictive toxicology.
Collapse
Affiliation(s)
- Sean Watford
- ORAU, Contractor to U.S. Environmental Protection Agency through the National Student Services Contract, United States; National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, United States
| | - Ly Ly Pham
- ORAU, Contractor to U.S. Environmental Protection Agency through the National Student Services Contract, United States; ORISE Postdoctoral Research Participant, United States
| | | | | | - Matthew T Martin
- ORAU, Contractor to U.S. Environmental Protection Agency through the National Student Services Contract, United States; Currently at Drug Safety Research and Development, Global Investigative Toxicology, Pfizer, Groton, CT, United States
| | - Katie Paul Friedman
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, United States.
| |
Collapse
|
9
|
Grashow RG, De La Rosa VY, Watford SM, Ackerman JM, Rudel RA. BCScreen: A gene panel to test for breast carcinogenesis in chemical safety screening. Comput Toxicol 2018; 5:16-24. [PMID: 31218268 PMCID: PMC6583811 DOI: 10.1016/j.comtox.2017.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Targeted gene lists have been used in clinical settings to specify breast tumor type, and to predict breast cancer prognosis and response to treatment. Separately, panels have been curated to predict systemic toxicity and xenoestrogen activity as a part of chemical screening strategies. However, currently available panels do not specifically target biological processes relevant to breast development and carcinogenesis. We have developed a gene panel called the Breast Carcinogen Screen (BCScreen) as a tool to identify potential breast carcinogens and characterize mechanisms of toxicity. First, we used four seminal reviews to identify 14 key characteristics of breast carcinogenesis, such as apoptosis, immunomodulation, and genotoxicity. Then, using a hybrid data and knowledge-driven framework, we systematically combined information from whole transcriptome data from genomic databases, biomedical literature, the CTD chemical-gene interaction database, and primary literature review to generate a panel of 500 genes relevant to breast carcinogenesis. We used normalized pointwise mutual information (NPMI) to rank genes that frequently co-occurred with key characteristics in biomedical literature. We found that many genes identified for BCScreen were not included in prognostic breast cancer or systemic toxicity panels. For example, more than half of BCScreen genes were not included in the Tox21 S1500+ general toxicity gene list. Of the 230 that did overlap between the two panels, representation varied across characteristics of carcinogenesis ranging from 21% for genes associated with epigenetics to 82% for genes associated with xenobiotic metabolism. Enrichment analysis of BCScreen identified pathways and processes including response to steroid hormones, cancer, cell cycle, apoptosis, DNA damage and breast cancer. The biologically-based systematic approach to gene prioritization demonstrated here provides a flexible framework for creating disease-focused gene panels to support discovery related to etiology. With validation, BCScreen may also be useful for toxicological screening relevant to breast carcinogenesis.
Collapse
Affiliation(s)
- Rachel G. Grashow
- Silent Spring Institute, 320 Nevada Street, Newton, MA 02460, United States
| | - Vanessa Y. De La Rosa
- Silent Spring Institute, 320 Nevada Street, Newton, MA 02460, United States
- Social Science Environmental Health Research Institute, Northeastern University, Boston, MA, United States
| | - Sean M. Watford
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, UNC-Chapel Hill, Chapel Hill, NC, United States
| | - Janet M. Ackerman
- Silent Spring Institute, 320 Nevada Street, Newton, MA 02460, United States
| | - Ruthann A. Rudel
- Silent Spring Institute, 320 Nevada Street, Newton, MA 02460, United States
| |
Collapse
|