1
|
Roberts MC, Holt KE, Del Fiol G, Baccarelli AA, Allen CG. Precision public health in the era of genomics and big data. Nat Med 2024; 30:1865-1873. [PMID: 38992127 DOI: 10.1038/s41591-024-03098-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/29/2024] [Indexed: 07/13/2024]
Abstract
Precision public health (PPH) considers the interplay between genetics, lifestyle and the environment to improve disease prevention, diagnosis and treatment on a population level-thereby delivering the right interventions to the right populations at the right time. In this Review, we explore the concept of PPH as the next generation of public health. We discuss the historical context of using individual-level data in public health interventions and examine recent advancements in how data from human and pathogen genomics and social, behavioral and environmental research, as well as artificial intelligence, have transformed public health. Real-world examples of PPH are discussed, emphasizing how these approaches are becoming a mainstay in public health, as well as outstanding challenges in their development, implementation and sustainability. Data sciences, ethical, legal and social implications research, capacity building, equity research and implementation science will have a crucial role in realizing the potential for 'precision' to enhance traditional public health approaches.
Collapse
Affiliation(s)
- Megan C Roberts
- Division of Pharmaceutical Outcomes and Policy, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC, USA.
| | - Kathryn E Holt
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, UK
- Department of Infectious Diseases, School of Translational Medicine, Monash University, Melbourne, Victoria, Australia
| | - Guilherme Del Fiol
- Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Andrea A Baccarelli
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Caitlin G Allen
- Department of Public Health Sciences, College of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
2
|
Mahbub M, Goethert I, Danciu I, Knight K, Srinivasan S, Tamang S, Rozenberg-Ben-Dror K, Solares H, Martins S, Trafton J, Begoli E, Peterson GD. Question-answering system extracts information on injection drug use from clinical notes. COMMUNICATIONS MEDICINE 2024; 4:61. [PMID: 38570620 PMCID: PMC10991373 DOI: 10.1038/s43856-024-00470-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 02/29/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Injection drug use (IDU) can increase mortality and morbidity. Therefore, identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no other structured data available, such as International Classification of Disease (ICD) codes, and IDU is most often documented in unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. METHODS To address this gap in clinical information, we design a question-answering (QA) framework to extract information on IDU from clinical notes for use in clinical operations. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. We use 2323 clinical notes of 1145 patients curated from the US Department of Veterans Affairs (VA) Corporate Data Warehouse to construct the gold-standard dataset for developing and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information from temporally out-of-distribution data. RESULTS Here, we show that for a strict match between gold-standard and predicted answers, the QA model achieves a 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains a 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. CONCLUSIONS Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care.
Collapse
Affiliation(s)
- Maria Mahbub
- Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| | - Ian Goethert
- Information Technology Services Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Ioana Danciu
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Kathryn Knight
- Information Technology Services Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Sudarshan Srinivasan
- Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Suzanne Tamang
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Hugo Solares
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
| | - Susana Martins
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
| | - Jodie Trafton
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
| | - Edmon Begoli
- Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Gregory D Peterson
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Knoxville, TN, USA
| |
Collapse
|
3
|
Ezell JM, Ajayi BP, Parikh T, Miller K, Rains A, Scales D. Drug Use and Artificial Intelligence: Weighing Concerns and Possibilities for Prevention. Am J Prev Med 2024; 66:568-572. [PMID: 38056683 DOI: 10.1016/j.amepre.2023.11.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/08/2023]
Affiliation(s)
- Jerel M Ezell
- Community Health Sciences, School of Public Health, University of California Berkeley, Berkeley, California; Berkeley Center for Cultural Humility, University of California Berkeley, Berkeley, California.
| | - Babatunde Patrick Ajayi
- Community Health Sciences, School of Public Health, University of California Berkeley, Berkeley, California
| | - Tapan Parikh
- Information Science, The College of Arts & Sciences, Cornell University, New York, New York
| | - Kyle Miller
- Department of Medicine, Southern Illinois University, Carbondale, Illinois
| | - Alex Rains
- Pritzer School of Medicine, The University of Chicago, Chicago, Illinois
| | - David Scales
- Division of General Internal Medicine, Joan and Sanford I. Weill Department of Medicine, Weill Cornell Medicine, New York, New York
| |
Collapse
|
4
|
Allen B, Schell RC, Jent VA, Krieger M, Pratty C, Hallowell BD, Goedel WC, Basta M, Yedinak JL, Li Y, Cartus AR, Marshall BDL, Cerdá M, Ahern J, Neill DB. PROVIDENT: Development and Validation of a Machine Learning Model to Predict Neighborhood-level Overdose Risk in Rhode Island. Epidemiology 2024; 35:232-240. [PMID: 38180881 PMCID: PMC10842082 DOI: 10.1097/ede.0000000000001695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
BACKGROUND Drug overdose persists as a leading cause of death in the United States, but resources to address it remain limited. As a result, health authorities must consider where to allocate scarce resources within their jurisdictions. Machine learning offers a strategy to identify areas with increased future overdose risk to proactively allocate overdose prevention resources. This modeling study is embedded in a randomized trial to measure the effect of proactive resource allocation on statewide overdose rates in Rhode Island (RI). METHODS We used statewide data from RI from 2016 to 2020 to develop an ensemble machine learning model predicting neighborhood-level fatal overdose risk. Our ensemble model integrated gradient boosting machine and super learner base models in a moving window framework to make predictions in 6-month intervals. Our performance target, developed a priori with the RI Department of Health, was to identify the 20% of RI neighborhoods containing at least 40% of statewide overdose deaths, including at least one neighborhood per municipality. The model was validated after trial launch. RESULTS Our model selected priority neighborhoods capturing 40.2% of statewide overdose deaths during the test periods and 44.1% of statewide overdose deaths during validation periods. Our ensemble outperformed the base models during the test periods and performed comparably to the best-performing base model during the validation periods. CONCLUSIONS We demonstrated the capacity for machine learning models to predict neighborhood-level fatal overdose risk to a degree of accuracy suitable for practitioners. Jurisdictions may consider predictive modeling as a tool to guide allocation of scarce resources.
Collapse
Affiliation(s)
- Bennett Allen
- From the Center for Opioid Epidemiology and Policy, Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA
| | - Robert C Schell
- Division of Health Policy and Management, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
| | - Victoria A Jent
- From the Center for Opioid Epidemiology and Policy, Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA
| | - Maxwell Krieger
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Claire Pratty
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Benjamin D Hallowell
- Center for Health Data and Analysis, Rhode Island Department of Health, Providence, RI, USA
| | - William C Goedel
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Melissa Basta
- Center for Health Data and Analysis, Rhode Island Department of Health, Providence, RI, USA
| | - Jesse L Yedinak
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Yu Li
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Abigail R Cartus
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Brandon D L Marshall
- Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | - Magdalena Cerdá
- From the Center for Opioid Epidemiology and Policy, Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA
| | - Jennifer Ahern
- Division of Epidemiology, School of Public Health, University of California, Berkeley, CA, USA
| | - Daniel B Neill
- Center for Urban Science and Progress, New York University, New York, NY, USA
- Department of Computer Science, Courant Institute for Mathematical Sciences, New York University, New York, NY, USA
- Robert F. Wagner Graduate School of Public Service, New York University, New York, NY, USA
| |
Collapse
|
5
|
Graham SS, Shifflet S, Amjad M, Claborn K. An interpretable machine learning framework for opioid overdose surveillance from emergency medical services records. PLoS One 2024; 19:e0292170. [PMID: 38289927 PMCID: PMC10826931 DOI: 10.1371/journal.pone.0292170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 09/14/2023] [Indexed: 02/01/2024] Open
Abstract
The goal of this study is to develop and validate a lightweight, interpretable machine learning (ML) classifier to identify opioid overdoses in emergency medical services (EMS) records. We conducted a comparative assessment of three feature engineering approaches designed for use with unstructured narrative data. Opioid overdose annotations were provided by two harm reduction paramedics and two supporting annotators trained to reliably match expert annotations. Candidate feature engineering techniques included term frequency-inverse document frequency (TF-IDF), a highly performant approach to concept vectorization, and a custom approach based on the count of empirically-identified keywords. Each feature set was trained using four model architectures: generalized linear model (GLM), Naïve Bayes, neural network, and Extreme Gradient Boost (XGBoost). Ensembles of trained models were also evaluated. The custom feature models were also assessed for variable importance to aid interpretation. Models trained using TF-IDF feature engineering ranged from AUROC = 0.59 (95% CI: 0.53-0.66) for the Naïve Bayes to AUROC = 0.76 (95% CI: 0.71-0.81) for the neural network. Models trained using concept vectorization features ranged from AUROC = 0.83 (95% 0.78-0.88)for the Naïve Bayes to AUROC = 0.89 (95% CI: 0.85-0.94) for the ensemble. Models trained using custom features were the most performant, with benchmarks ranging from AUROC = 0.92 (95% CI: 0.88-0.95) with the GLM to 0.93 (95% CI: 0.90-0.96) for the ensemble. The custom features model achieved positive predictive values (PPV) ranging for 80 to 100%, which represent substantial improvements over previously published EMS encounter opioid overdose classifiers. The application of this approach to county EMS data can productively inform local and targeted harm reduction initiatives.
Collapse
Affiliation(s)
- S. Scott Graham
- Department of Rhetoric & Writing, Center for Health Communication, University of Texas at Austin, Austin, TX, United States of Amedrica
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
| | - Savannah Shifflet
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
| | - Maaz Amjad
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
| | - Kasey Claborn
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
- Steve Hicks School of Social Work, University of Texas at Austin, Austin, TX, United States of Amedrica
| |
Collapse
|
6
|
Driscoll DL, O'Donnell H, Patel M, Cattell-Gordon DC. Assessing and Addressing the Determinants of Appalachian Population Health: A Scoping Review. JOURNAL OF APPALACHIAN HEALTH 2023; 5:85-102. [PMID: 38784141 PMCID: PMC11110904 DOI: 10.13023/jah.0503.07] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Introduction Residents of Appalachia experience elevated rates of morbidity and mortality compared to national averages, and these disparities are associated with inequitable exposures to various determinants of population health. Social and environmental determinants of health are a useful lens through which to develop and evaluate programs to mitigate regional health disparities. Methods This 2023 scoping review was conducted of studies linking determinants of Appalachian health with leading causes of regional mortality and morbidity. The search strategy employed a keyword search that included geographic terms for the Appalachian Region and the primary adverse health outcomes in that region. Studies meeting the following inclusion criteria were reviewed: original article, published in the last five years, involving an Appalachian population, and includes a rigorous assessment of an association between a population health determinant and one or more leading causes of Appalachian morbidity and mortality. Results The search returned 221 research articles, including 30 interventional studies. The top three health outcomes included cancer (43.59%), diseases of despair (23.08%), and diabetes (12.82). Access to care (27.3%), rurality (18.9%), and education (14.8%) were the most common population health determinants identified. Interventional studies were categorized by program types: education, technology, partnerships, and multilevel interventions. Due to the heterogeneity of study types, the studies were combined using a narrative synthesis. Implications The results of this work can inform the development and evaluation of additional programs to promote Appalachian population health. Our study team will use these results to inform community-based discussions that develop strategic plans to mitigate health disparities in Central and Southcentral Appalachian Virginia.
Collapse
Affiliation(s)
| | | | | | - David C Cattell-Gordon
- Center for Telehealth at the University of Virginia (retired); ThreadEx Consulting LLC (current)
| |
Collapse
|
7
|
Tang LA, Korona-Bailey J, Zaras D, Roberts A, Mukhopadhyay S, Espy S, Walsh CG. Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study. JMIR Public Health Surveill 2023; 9:e45246. [PMID: 37204824 PMCID: PMC10238956 DOI: 10.2196/45246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/17/2023] [Accepted: 03/07/2023] [Indexed: 03/09/2023] Open
Abstract
BACKGROUND Fatal drug overdose surveillance informs prevention but is often delayed because of autopsy report processing and death certificate coding. Autopsy reports contain narrative text describing scene evidence and medical history (similar to preliminary death scene investigation reports) and may serve as early data sources for identifying fatal drug overdoses. To facilitate timely fatal overdose reporting, natural language processing was applied to narrative texts from autopsies. OBJECTIVE This study aimed to develop a natural language processing-based model that predicts the likelihood that an autopsy report narrative describes an accidental or undetermined fatal drug overdose. METHODS Autopsy reports of all manners of death (2019-2021) were obtained from the Tennessee Office of the State Chief Medical Examiner. The text was extracted from autopsy reports (PDFs) using optical character recognition. Three common narrative text sections were identified, concatenated, and preprocessed (bag-of-words) using term frequency-inverse document frequency scoring. Logistic regression, support vector machine (SVM), random forest, and gradient boosted tree classifiers were developed and validated. Models were trained and calibrated using autopsies from 2019 to 2020 and tested using those from 2021. Model discrimination was evaluated using the area under the receiver operating characteristic, precision, recall, F1-score, and F2-score (prioritizes recall over precision). Calibration was performed using logistic regression (Platt scaling) and evaluated using the Spiegelhalter z test. Shapley additive explanations values were generated for models compatible with this method. In a post hoc subgroup analysis of the random forest classifier, model discrimination was evaluated by forensic center, race, age, sex, and education level. RESULTS A total of 17,342 autopsies (n=5934, 34.22% cases) were used for model development and validation. The training set included 10,215 autopsies (n=3342, 32.72% cases), the calibration set included 538 autopsies (n=183, 34.01% cases), and the test set included 6589 autopsies (n=2409, 36.56% cases). The vocabulary set contained 4002 terms. All models showed excellent performance (area under the receiver operating characteristic ≥0.95, precision ≥0.94, recall ≥0.92, F1-score ≥0.94, and F2-score ≥0.92). The SVM and random forest classifiers achieved the highest F2-scores (0.948 and 0.947, respectively). The logistic regression and random forest were calibrated (P=.95 and P=.85, respectively), whereas the SVM and gradient boosted tree classifiers were miscalibrated (P=.03 and P<.001, respectively). "Fentanyl" and "accident" had the highest Shapley additive explanations values. Post hoc subgroup analyses revealed lower F2-scores for autopsies from forensic centers D and E. Lower F2-score were observed for the American Indian, Asian, ≤14 years, and ≥65 years subgroups, but larger sample sizes are needed to validate these findings. CONCLUSIONS The random forest classifier may be suitable for identifying potential accidental and undetermined fatal overdose autopsies. Further validation studies should be conducted to ensure early detection of accidental and undetermined fatal drug overdoses across all subgroups.
Collapse
Affiliation(s)
- Leigh Anne Tang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Office of Informatics and Analytics, Tennessee Department of Health, Nashville, TN, United States
| | - Jessica Korona-Bailey
- Office of Informatics and Analytics, Tennessee Department of Health, Nashville, TN, United States
| | - Dimitrios Zaras
- Office of Informatics and Analytics, Tennessee Department of Health, Nashville, TN, United States
| | - Allison Roberts
- Office of Informatics and Analytics, Tennessee Department of Health, Nashville, TN, United States
| | - Sutapa Mukhopadhyay
- Office of Informatics and Analytics, Tennessee Department of Health, Nashville, TN, United States
| | - Stephen Espy
- Office of Informatics and Analytics, Tennessee Department of Health, Nashville, TN, United States
| | - Colin G Walsh
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
8
|
Surveillance of communicable diseases using social media: A systematic review. PLoS One 2023; 18:e0282101. [PMID: 36827297 PMCID: PMC9956027 DOI: 10.1371/journal.pone.0282101] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 02/07/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Communicable diseases pose a severe threat to public health and economic growth. The traditional methods that are used for public health surveillance, however, involve many drawbacks, such as being labor intensive to operate and resulting in a lag between data collection and reporting. To effectively address the limitations of these traditional methods and to mitigate the adverse effects of these diseases, a proactive and real-time public health surveillance system is needed. Previous studies have indicated the usefulness of performing text mining on social media. OBJECTIVE To conduct a systematic review of the literature that used textual content published to social media for the purpose of the surveillance and prediction of communicable diseases. METHODOLOGY Broad search queries were formulated and performed in four databases. Both journal articles and conference materials were included. The quality of the studies, operationalized as reliability and validity, was assessed. This qualitative systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULTS Twenty-three publications were included in this systematic review. All studies reported positive results for using textual social media content to surveille communicable diseases. Most studies used Twitter as a source for these data. Influenza was studied most frequently, while other communicable diseases received far less attention. Journal articles had a higher quality (reliability and validity) than conference papers. However, studies often failed to provide important information about procedures and implementation. CONCLUSION Text mining of health-related content published on social media can serve as a novel and powerful tool for the automated, real-time, and remote monitoring of public health and for the surveillance and prediction of communicable diseases in particular. This tool can address limitations related to traditional surveillance methods, and it has the potential to supplement traditional methods for public health surveillance.
Collapse
|
9
|
Carpenter KA, Altman RB. Using GPT-3 to Build a Lexicon of Drugs of Abuse Synonyms for Social Media Pharmacovigilance. Biomolecules 2023; 13:biom13020387. [PMID: 36830756 PMCID: PMC9953178 DOI: 10.3390/biom13020387] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/09/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Drug abuse is a serious problem in the United States, with over 90,000 drug overdose deaths nationally in 2020. A key step in combating drug abuse is detecting, monitoring, and characterizing its trends over time and location, also known as pharmacovigilance. While federal reporting systems accomplish this to a degree, they often have high latency and incomplete coverage. Social-media-based pharmacovigilance has zero latency, is easily accessible and unfiltered, and benefits from drug users being willing to share their experiences online pseudo-anonymously. However, unlike highly structured official data sources, social media text is rife with misspellings and slang, making automated analysis difficult. Generative Pretrained Transformer 3 (GPT-3) is a large autoregressive language model specialized for few-shot learning that was trained on text from the entire internet. We demonstrate that GPT-3 can be used to generate slang and common misspellings of terms for drugs of abuse. We repeatedly queried GPT-3 for synonyms of drugs of abuse and filtered the generated terms using automated Google searches and cross-references to known drug names. When generated terms for alprazolam were manually labeled, we found that our method produced 269 synonyms for alprazolam, 221 of which were new discoveries not included in an existing drug lexicon for social media. We repeated this process for 98 drugs of abuse, of which 22 are widely-discussed drugs of abuse, building a lexicon of colloquial drug synonyms that can be used for pharmacovigilance on social media.
Collapse
Affiliation(s)
- Kristy A. Carpenter
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Departments of Bioengineering, Genetics, and Medicine, Stanford University, Stanford, CA 94305, USA
- Correspondence:
| |
Collapse
|
10
|
Ward PJ, Young AM, Slavova S, Liford M, Daniels L, Lucas R, Kavuluru R. Deep Neural Networks for Fine-Grained Surveillance of Overdose Mortality. Am J Epidemiol 2023; 192:257-266. [PMID: 36222700 DOI: 10.1093/aje/kwac180] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/16/2022] [Accepted: 10/10/2022] [Indexed: 02/07/2023] Open
Abstract
Surveillance of drug overdose deaths relies on death certificates for identification of the substances that caused death. Drugs and drug classes can be identified through the International Classification of Diseases, Tenth Revision (ICD-10), codes present on death certificates. However, ICD-10 codes do not always provide high levels of specificity in drug identification. To achieve more fine-grained identification of substances on death certificate, the free-text cause-of-death section, completed by the medical certifier, must be analyzed. Current methods for analyzing free-text death certificates rely solely on lookup tables for identifying specific substances, which must be frequently updated and maintained. To improve identification of drugs on death certificates, a deep-learning named-entity recognition model was developed, utilizing data from the Kentucky Drug Overdose Fatality Surveillance System (2014-2019), which achieved an F1-score of 99.13%. This model can identify new drug misspellings and novel substances that are not present on current surveillance lookup tables, enhancing the surveillance of drug overdose deaths.
Collapse
|
11
|
Fisher S, Rosella LC. Priorities for successful use of artificial intelligence by public health organizations: a literature review. BMC Public Health 2022; 22:2146. [PMID: 36419010 PMCID: PMC9682716 DOI: 10.1186/s12889-022-14422-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 10/23/2022] [Indexed: 11/24/2022] Open
Abstract
Artificial intelligence (AI) has the potential to improve public health's ability to promote the health of all people in all communities. To successfully realize this potential and use AI for public health functions it is important for public health organizations to thoughtfully develop strategies for AI implementation. Six key priorities for successful use of AI technologies by public health organizations are discussed: 1) Contemporary data governance; 2) Investment in modernized data and analytic infrastructure and procedures; 3) Addressing the skills gap in the workforce; 4) Development of strategic collaborative partnerships; 5) Use of good AI practices for transparency and reproducibility, and; 6) Explicit consideration of equity and bias.
Collapse
Affiliation(s)
- Stacey Fisher
- grid.17063.330000 0001 2157 2938Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada ,grid.415400.40000 0001 1505 2354Public Health Ontario, Toronto, ON Canada ,grid.494618.6Vector Institute for Artificial Intelligence, Toronto, ON Canada ,grid.418647.80000 0000 8849 1617ICES, Toronto, ON Canada
| | - Laura C. Rosella
- grid.17063.330000 0001 2157 2938Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada ,grid.494618.6Vector Institute for Artificial Intelligence, Toronto, ON Canada ,grid.418647.80000 0000 8849 1617ICES, Toronto, ON Canada ,grid.417293.a0000 0004 0459 7334Institute for Better Health, Trillium Health Partners, Mississauga, ON Canada ,grid.17063.330000 0001 2157 2938Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON Canada
| |
Collapse
|
12
|
Goodman-Meza D, Tang A, Aryanfar B, Vazquez S, Gordon AJ, Goto M, Goetz MB, Shoptaw S, Bui AAT. Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records. Open Forum Infect Dis 2022; 9:ofac471. [PMID: 36168546 PMCID: PMC9511274 DOI: 10.1093/ofid/ofac471] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 09/08/2022] [Indexed: 11/15/2022] Open
Abstract
Background Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific International Classification of Diseases (ICD) codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific ICD codes for identifying PWID. Methods We manually reviewed 1000 records of patients diagnosed with Staphylococcus aureus bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of ICD codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value. Results Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all ICD-based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786-.967) and 0.592 (95% CI, .550-.632) for the best ICD-based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%. Conclusions NLP/ML outperformed ICD-based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance.
Collapse
Affiliation(s)
- David Goodman-Meza
- Correspondence: David Goodman-Meza, MD, MAS, David Geffen School of Medicine at UCLA, 10833 Le Conte Ave, CHS 52-215, Los Angeles, CA, 90095-1688 ()
| | - Amber Tang
- Department of Internal Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Babak Aryanfar
- Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, California, USA
| | - Sergio Vazquez
- Undergraduate Studies, Dartmouth College, Hanover, New Hampshire, USA
| | - Adam J Gordon
- Informatics, Decision-Enhancement, and Analytic Sciences Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Michihiko Goto
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, USA
- Center for Access and Delivery Research and Evaluation, Iowa City Veterans Affairs Medical Center, Iowa City, Iowa, USA
| | - Matthew Bidwell Goetz
- Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, California, USA
- Department of Internal Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Steven Shoptaw
- Department of Family Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | | |
Collapse
|
13
|
Goodman-Meza D, Shover CL, Medina JA, Tang AB, Shoptaw S, Bui AAT. Development and Validation of Machine Models Using Natural Language Processing to Classify Substances Involved in Overdose Deaths. JAMA Netw Open 2022; 5:e2225593. [PMID: 35939303 PMCID: PMC9361079 DOI: 10.1001/jamanetworkopen.2022.25593] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Overdose is one of the leading causes of death in the US; however, surveillance data lag considerably from medical examiner determination of the death to reporting in national surveillance reports. OBJECTIVE To automate the classification of deaths related to substances in medical examiner data using natural language processing (NLP) and machine learning (ML). DESIGN, SETTING, AND PARTICIPANTS Diagnostic study comparing different natural language processing and machine learning algorithms to identify substances related to overdose in 10 health jurisdictions in the US from January 1, 2020, to December 31, 2020. Unstructured text from 35 433 medical examiner and coroners' death records was examined. EXPOSURES Text from each case was manually classified to a substance that was related to the death. Three feature representation methods were used and compared: text frequency-inverse document frequency (TF-IDF), global vectors for word representations (GloVe), and concept unique identifier (CUI) embeddings. Several ML algorithms were trained and best models were selected based on F-scores. The best models were tested on a hold-out test set and results were reported with 95% CIs. MAIN OUTCOMES AND MEASURES Text data from death certificates were classified as any opioid, fentanyl, alcohol, cocaine, methamphetamine, heroin, prescription opioid, and an aggregate of other substances. Diagnostic metrics and 95% CIs were calculated for each combination of feature extraction method and machine learning classifier. RESULTS Of 35 433 death records analyzed (decedent median age, 58 years [IQR, 41-72 years]; 24 449 [69%] were male), the most common substances related to deaths included any opioid (5739 [16%]), fentanyl (4758 [13%]), alcohol (2866 [8%]), cocaine (2247 [6%]), methamphetamine (1876 [5%]), heroin (1613 [5%]), prescription opioids (1197 [3%]), and any benzodiazepine (1076 [3%]). The CUI embeddings had similar or better diagnostic metrics compared with word embeddings and TF-IDF for all substances except alcohol. ML classifiers had perfect or near perfect performance in classifying deaths related to any opioids, heroin, fentanyl, prescription opioids, methamphetamine, cocaine, and alcohol. Classification of benzodiazepines was suboptimal using all 3 feature extraction methods. CONCLUSIONS AND RELEVANCE In this diagnostic study, NLP/ML algorithms demonstrated excellent diagnostic performance at classifying substances related to overdoses. These algorithms should be integrated into workflows to decrease the lag time in reporting overdose surveillance data.
Collapse
Affiliation(s)
- David Goodman-Meza
- Division of Infectious Diseases, David Geffen School of Medicine at University of California, Los Angeles
| | - Chelsea L. Shover
- Division of General Internal Medicine, David Geffen School of Medicine at University of California, Los Angeles
| | - Jesus A. Medina
- David Geffen School of Medicine at University of California, Los Angeles
| | - Amber B. Tang
- David Geffen School of Medicine at University of California, Los Angeles
| | - Steven Shoptaw
- Department of Family Medicine, David Geffen School of Medicine at University of California, Los Angeles
| | - Alex A. T. Bui
- Medical & Imaging Informatics (MII) Group, Department of Radiological Sciences, University of California, Los Angeles
| |
Collapse
|
14
|
Volkow ND, Chandler RK, Villani J. Need for comprehensive and timely data to address the opioid overdose epidemic without a blindfold. Addiction 2022; 117:2132-2134. [PMID: 35611646 DOI: 10.1111/add.15957] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Nora D Volkow
- National Institute on Drug Abuse, National Institutes of Health, Bethesda, MD, USA
| | - Redonna K Chandler
- National Institute on Drug Abuse, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer Villani
- National Institute on Drug Abuse, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
15
|
Marshall BDL, Alexander-Scott N, Yedinak JL, Hallowell BD, Goedel WC, Allen B, Schell RC, Li Y, Krieger MS, Pratty C, Ahern J, Neill DB, Cerdá M. Preventing Overdose Using Information and Data from the Environment (PROVIDENT): protocol for a randomized, population-based, community intervention trial. Addiction 2022; 117:1152-1162. [PMID: 34729851 PMCID: PMC8904285 DOI: 10.1111/add.15731] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 10/08/2021] [Indexed: 12/28/2022]
Abstract
BACKGROUND AND AIMS In light of the accelerating drug overdose epidemic in North America, new strategies are needed to identify communities most at risk to prioritize geographically the existing public health resources (e.g. street outreach, naloxone distribution efforts). We aimed to develop PROVIDENT (Preventing Overdose using Information and Data from the Environment), a machine learning-based forecasting tool to predict future overdose deaths at the census block group (i.e. neighbourhood) level. DESIGN Randomized, population-based, community intervention trial. SETTING Rhode Island, USA. PARTICIPANTS All people who reside in Rhode Island during the study period may contribute data to either the model or the trial outcomes. INTERVENTION Each of the state's 39 municipalities will be randomized to the intervention (PROVIDENT) or comparator condition. An interactive, web-based tool will be developed to visualize the PROVIDENT model predictions. Municipalities assigned to the treatment arm will receive neighbourhood risk predictions from the PROVIDENT model, and state agencies and community-based organizations will direct resources to neighbourhoods identified as high risk. Municipalities assigned to the control arm will continue to receive surveillance information and overdose prevention resources, but they will not receive neighbourhood risk predictions. MEASUREMENTS The primary outcome is the municipal-level rate of fatal and non-fatal drug overdoses. Fatal overdoses will be defined as unintentional drug-related death; non-fatal overdoses will be defined as an emergency department visit for a suspected overdose reported through the state's syndromic surveillance system. Intervention efficacy will be assessed using Poisson or negative binomial regression to estimate incidence rate ratios comparing fatal and non-fatal overdose rates in treatment vs. control municipalities. COMMENTS The findings will inform the utility of predictive modelling as a tool to improve public health decision-making and inform resource allocation to communities that should be prioritized for prevention, treatment, recovery and overdose rescue services.
Collapse
Affiliation(s)
- Brandon D. L. Marshall
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | | | - Jesse L. Yedinak
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | | | - William C. Goedel
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | - Bennett Allen
- Center for Opioid Epidemiology and Policy, Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA
| | - Robert C. Schell
- Division of Health Policy and Management, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
| | - Yu Li
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | - Maxwell S. Krieger
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | - Claire Pratty
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | - Jennifer Ahern
- Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
| | - Daniel B. Neill
- Center for Urban Science and Progress, New York University, New York, NY, USA
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, USA
- Robert F. Wagner Graduate School of Public Service, New York University, New York, NY, USA
| | - Magdalena Cerdá
- Center for Opioid Epidemiology and Policy, Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA
| |
Collapse
|
16
|
Using Machine Learning for Pharmacovigilance: A Systematic Review. Pharmaceutics 2022; 14:pharmaceutics14020266. [PMID: 35213998 PMCID: PMC8924891 DOI: 10.3390/pharmaceutics14020266] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/13/2022] [Accepted: 01/21/2022] [Indexed: 02/04/2023] Open
Abstract
Pharmacovigilance is a science that involves the ongoing monitoring of adverse drug reactions to existing medicines. Traditional approaches in this field can be expensive and time-consuming. The application of natural language processing (NLP) to analyze user-generated content is hypothesized as an effective supplemental source of evidence. In this systematic review, a broad and multi-disciplinary literature search was conducted involving four databases. A total of 5318 publications were initially found. Studies were considered relevant if they reported on the application of NLP to understand user-generated text for pharmacovigilance. A total of 16 relevant publications were included in this systematic review. All studies were evaluated to have medium reliability and validity. For all types of drugs, 14 publications reported positive findings with respect to the identification of adverse drug reactions, providing consistent evidence that natural language processing can be used effectively and accurately on user-generated textual content that was published to the Internet to identify adverse drug reactions for the purpose of pharmacovigilance. The evidence presented in this review suggest that the analysis of textual data has the potential to complement the traditional system of pharmacovigilance.
Collapse
|
17
|
Bharat C, Hickman M, Barbieri S, Degenhardt L. Big data and predictive modelling for the opioid crisis: existing research and future potential. Lancet Digit Health 2021; 3:e397-e407. [PMID: 34045004 DOI: 10.1016/s2589-7500(21)00058-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/21/2021] [Accepted: 03/24/2021] [Indexed: 12/30/2022]
Abstract
A need exists to accurately estimate overdose risk and improve understanding of how to deliver treatments and interventions in people with opioid use disorder in a way that reduces such risk. We consider opportunities for predictive analytics and routinely collected administrative data to evaluate how overdose could be reduced among people with opioid use disorder. Specifically, we summarise global trends in opioid use and overdoses; describe the use of big data in research into opioid overdose; consider the potential for predictive modelling, including machine learning, for prevention and monitoring of opioid overdoses; and outline the challenges and risks relating to the use of big data and machine learning in reducing harms that are related to opioid use. Future research for improving the coverage and provision of existing interventions, treatments, and resources for opioid use disorder requires collaboration of multiple agencies. Predictive modelling could transport the concept of stratified medicine to public health through novel methods, such as predictive modelling and emulated trials for evaluating diagnoses and prognoses of opioid use disorder, predicting treatment response, and providing targeted treatment recommendations.
Collapse
Affiliation(s)
- Chrianna Bharat
- National Drug and Alcohol Research Centre, University of New South Wales, Sydney, NSW, Australia.
| | - Matthew Hickman
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Sebastiano Barbieri
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia
| | - Louisa Degenhardt
- National Drug and Alcohol Research Centre, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
18
|
Harris DR, Eisinger C, Wang Y, Delcher C. Challenges and Barriers in Applying Natural Language Processing to Medical Examiner Notes from Fatal Opioid Poisoning Cases. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2020; 2020:3727-3736. [PMID: 35282306 PMCID: PMC8910776 DOI: 10.1109/bigdata50022.2020.9378443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We detail the challenges and barriers in applying natural language processing techniques to a collection of medical examiner case investigation notes related to fatal opioid poisonings. Major advances in biomedical informatics have made natural language processing (NLP) of medical texts both a realistic and useful task. Biomedical NLP tools are typically designed to process documents originating from biomedical libraries or electronic health records (EHRs). The usefulness of biomedical NLP tools on texts authored outside of EHRs is unclear, despite an abundance of medicolegal documents existing at the intersection of medicine and law. In particular, we detail our experiences processing unstructured text and extracting semantic concepts using case investigation notes; these notes were authored by trained investigative professionals working in a medical examiner's office and describe cases containing deaths related to fatal opioid poisonings. Applying NLP to case notes is a particularly important step in generalizing the advances of biomedical NLP for other related domains and giving guidance to data scientists working with unstructured data generated outside of EHRs.
Collapse
Affiliation(s)
- Daniel R Harris
- Institute for Pharmaceutical Outcomes and Policy, University of Kentucky, Lexington, Kentucky 40506
- Center for Clinical and Translational Sciences, University of Kentucky, Lexington, Kentucky 40506
| | - Christian Eisinger
- Institute for Pharmaceutical Outcomes and Policy, University of Kentucky, Lexington, Kentucky 40506
| | - Yanning Wang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida 32611
| | - Chris Delcher
- Institute for Pharmaceutical Outcomes and Policy, University of Kentucky, Lexington, Kentucky 40506
| |
Collapse
|