1
|
Wyss R, Plasek JM, Zhou L, Bessette LG, Schneeweiss S, Rassen JA, Tsacogianis T, Lin KJ. Scalable Feature Engineering from Electronic Free Text Notes to Supplement Confounding Adjustment of Claims-Based Pharmacoepidemiologic Studies. Clin Pharmacol Ther 2023; 113:832-838. [PMID: 36528788 PMCID: PMC10913938 DOI: 10.1002/cpt.2826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Natural language processing (NLP) tools turn free-text notes (FTNs) from electronic health records (EHRs) into data features that can supplement confounding adjustment in pharmacoepidemiologic studies. However, current applications are difficult to scale. We used unsupervised NLP to generate high-dimensional feature spaces from FTNs to improve prediction of drug exposure and outcomes compared with claims-based analyses. We linked Medicare claims with EHR data to generate three cohort studies comparing different classes of medications on the risk of various clinical outcomes. We used "bag-of-words" to generate features for the top 20,000 most prevalent terms from FTNs. We compared machine learning (ML) prediction algorithms using different sets of candidate predictors: Set1 (39 researcher-specified variables), Set2 (Set1 + ML-selected claims codes), and Set3 (Set1 + ML-selected NLP-generated features), vs. Set4 (Set1 + 2 + 3). When modeling treatment choice, we observed a consistent pattern across the examples: ML models utilizing Set4 performed best followed by Set2, Set3, then Set1. When modeling the outcome risk, there was little to no improvement beyond models based on Set1. Supplementing claims data with NLP-generated features from free text notes improved prediction of prescribing choices but had little or no improvement on clinical risk prediction. These findings have implications for strategies to improve confounding using EHR data in pharmacoepidemiologic studies.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Joseph M. Plasek
- Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Li Zhou
- Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Lily G. Bessette
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | | | - Theodore Tsacogianis
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School
| |
Collapse
|
2
|
de Ridder MAJ, de Wilde M, de Ben C, Leyba AR, Mosseveld BMT, Verhamme KMC, van der Lei J, Rijnbeek PR. Data Resource Profile: The Integrated Primary Care Information (IPCI) database, The Netherlands. Int J Epidemiol 2022; 51:e314-e323. [PMID: 35182144 PMCID: PMC9749682 DOI: 10.1093/ije/dyac026] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 02/03/2022] [Indexed: 01/21/2023] Open
Affiliation(s)
- Maria A J de Ridder
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Marcel de Wilde
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Christina de Ben
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Armando R Leyba
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | - Katia M C Verhamme
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Johan van der Lei
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Peter R Rijnbeek
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
3
|
Wyss R, Yanover C, El-Hay T, Bennett D, Platt RW, Zullo AR, Sari G, Wen X, Ye Y, Yuan H, Gokhale M, Patorno E, Lin KJ. Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: an overview of the current literature. Pharmacoepidemiol Drug Saf 2022; 31:932-943. [PMID: 35729705 PMCID: PMC9541861 DOI: 10.1002/pds.5500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 06/01/2022] [Accepted: 06/05/2022] [Indexed: 11/10/2022]
Abstract
Controlling for large numbers of variables that collectively serve as 'proxies' for unmeasured factors can often improve confounding control in pharmacoepidemiologic studies utilizing administrative healthcare databases. There is a growing body of evidence showing that data-driven machine learning algorithms for high-dimensional proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment. In this paper, we discuss the considerations underpinning three areas for data-driven high-dimensional proxy confounder adjustment: 1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; 2) covariate prioritization, selection and adjustment; and 3) diagnostic assessment. We survey current approaches and recent advancements within each area, including the most widely used approach to proxy confounder adjustment in healthcare database studies (the high-dimensional propensity score or hdPS). We also discuss limitations of the hdPS and outline recent advancements that incorporate the principles of proxy adjustment with machine learning extensions to improve performance. We further discuss challenges and avenues of future development within each area. This manuscript is endorsed by the International Society for Pharmacoepidemiology (ISPE). This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Tal El-Hay
- KI Research Institute, Kfar Malal, Israel.,IBM Research-Haifa Labs, Haifa, Israel
| | - Dimitri Bennett
- Global Evidence and Outcomes, Takeda Pharmaceutical Company Ltd., Cambridge, MA, USA
| | | | - Andrew R Zullo
- Department of Health Services, Policy, and Practice, Brown University School of Public Health and Center of Innovation in Long-Term Services and Supports, Providence Veterans Affairs Medical Center, Providence, RI, USA
| | - Grammati Sari
- Real World Evidence Strategy Lead, Visible Analytics Ltd, Oxford, UK
| | - Xuerong Wen
- Health Outcomes, Pharmacy Practice, College of Pharmacy, University of Rhode Island, Kingston, RI, USA
| | - Yizhou Ye
- Global Epidemiology, AbbVie Inc. North Chicago, IL, USA
| | - Hongbo Yuan
- Canadian Agency for Drugs and Technologies in Health, Ottawa, Canada
| | - Mugdha Gokhale
- Pharmacoepidemiology, Center for Observational and Real-world Evidence, Merck, PA, USA
| | - Elisabetta Patorno
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|