1
|
Foster C, Wignall J, Kovach S, Choksi N, Allen D, Trgovcich J, Rochester JR, Ceger P, Daniel A, Hamm J, Truax J, Blake B, McIntyre B, Sutherland V, Stout MD, Kleinstreuer N. Standardizing Extracted Data Using Automated Application of Controlled Vocabularies. ENVIRONMENTAL HEALTH PERSPECTIVES 2024; 132:27006. [PMID: 38349723 PMCID: PMC10863721 DOI: 10.1289/ehp13215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 02/15/2024]
Abstract
BACKGROUND Extraction of toxicological end points from primary sources is a central component of systematic reviews and human health risk assessments. To ensure optimal use of these data, consistent language should be used for end point descriptions. However, primary source language describing treatment-related end points can vary greatly, resulting in large labor efforts to manually standardize extractions before data are fit for use. OBJECTIVES To minimize these labor efforts, we applied an augmented intelligence approach and developed automated tools to support standardization of extracted information via application of preexisting controlled vocabularies. METHODS We created and applied a harmonized controlled vocabulary crosswalk, consisting of Unified Medical Language System (UMLS) codes, German Federal Institute for Risk Assessment (BfR) DevTox harmonized terms, and The Organization for Economic Co-operation and Development (OECD) end point vocabularies, to roughly 34,000 extractions from prenatal developmental toxicology studies conducted by the National Toxicology Program (NTP) and 6,400 extractions from European Chemicals Agency (ECHA) prenatal developmental toxicology studies, all recorded based on the original study report language. RESULTS We automatically applied standardized controlled vocabulary terms to 75% of the NTP extracted end points and 57% of the ECHA extracted end points. Of all the standardized extracted end points, about half (51%) required manual review for potential extraneous matches or inaccuracies. Extracted end points that were not mapped to standardized terms tended to be too general or required human logic to find a good match. We estimate that this augmented intelligence approach saved > 350 hours of manual effort and yielded valuable resources including a controlled vocabulary crosswalk, organized related terms lists, code for implementing an automated mapping workflow, and a computationally accessible dataset. DISCUSSION Augmenting manual efforts with automation tools increased the efficiency of producing a findable, accessible, interoperable, and reusable (FAIR) dataset of regulatory guideline studies. This open-source approach can be readily applied to other legacy developmental toxicology datasets, and the code design is customizable for other study types. https://doi.org/10.1289/EHP13215.
Collapse
Affiliation(s)
| | | | | | - Neepa Choksi
- ILS, Research Triangle Park, North Carolina, USA
| | - Dave Allen
- ILS, Research Triangle Park, North Carolina, USA
| | | | | | | | - Amber Daniel
- ILS, Research Triangle Park, North Carolina, USA
| | - Jon Hamm
- ILS, Research Triangle Park, North Carolina, USA
| | - Jim Truax
- ILS, Research Triangle Park, North Carolina, USA
| | - Bevin Blake
- Division of Translational Toxicology (DTT), NIEHS, NIH, Research Triangle Park, North Carolina, USA
| | - Barry McIntyre
- Division of Translational Toxicology (DTT), NIEHS, NIH, Research Triangle Park, North Carolina, USA
| | - Vicki Sutherland
- Division of Translational Toxicology (DTT), NIEHS, NIH, Research Triangle Park, North Carolina, USA
| | - Matthew D. Stout
- Division of Translational Toxicology (DTT), NIEHS, NIH, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), DTT, NIEHS, NIH, Research Triangle Park, North Carolina, USA
| |
Collapse
|