1
|
Yang CT, Ngan K, Kim DH, Yang J, Liu J, Lin KJ. Establishing a Validation Framework of Treatment Discontinuation in Claims Data Using Natural Language Processing and Electronic Health Records. Clin Pharmacol Ther 2025; 118:138-145. [PMID: 40197528 PMCID: PMC12167144 DOI: 10.1002/cpt.3650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 03/06/2025] [Indexed: 04/10/2025]
Abstract
Measuring medication discontinuation in claims data primarily relies on the gaps between prescription fills, but such definitions are rarely validated. This study aimed to establish a natural language processing (NLP)-based validation framework to evaluate the performance of claims-based discontinuation algorithms for commonly used medications against NLP-based reference standards from electronic health records (EHRs). A total of 36,656 patients receiving antipsychotic medications (APMs), benzodiazepines (BZDs), warfarin, or direct oral anticoagulants (DOACs) were identified from the Mass General Brigham EHRs in 2007-2020. These EHR data were linked with 97,900 Medicare Part D claims. An NLP-aided chart review was applied to determine medication discontinuation from EHR (reference standard). In claims data, discontinuation was defined by having a prescription gap larger than 15-90 days (claims-based algorithms). Sensitivity, specificity, and predictive values of claims-based algorithms against the reference standard were measured. The sensitivity and specificity of 90-day-gap-based algorithms were 0.46 and 0.79 for haloperidol, 0.41 and 0.85 for atypical APMs, 0.47 and 0.75 for BZDs, 0.33 and 0.80 for warfarin, and 0.38 and 0.87 for DOACs, respectively. The corresponding estimates for 15-day-gap-based algorithms were 0.68 and 0.55 for haloperidol, 0.59 and 0.62 for atypical APMs, 0.71 and 0.45 for BZDs, 0.61 and 0.49 for warfarin, and 0.58 and 0.64 for DOACs, respectively. Positive predictive values were primarily affected by medication discontinuation rates and less by gap lengths. The overall accuracy of claims-based discontinuation algorithms differs by medications. This study demonstrates the scalability and utility of the NLP-based validation framework for multiple medications.
Collapse
Affiliation(s)
- Chun-Ting Yang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Kerry Ngan
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Dae Hyun Kim
- Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Harvard Medical School, Boston, MA, USA
- Division of Gerontology, Department of Medicine, Beth Israel Deaconess Medical Center, Massachusetts, USA
| | - Jie Yang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Jun Liu
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
2
|
Neves B, Moreira JM, Gonçalves S, Cerejo J, da Silva NA, Leite F, Silva MJ. Zero-shot learning for clinical phenotyping: Comparing LLMs and rule-based methods. Comput Biol Med 2025; 192:110181. [PMID: 40273817 DOI: 10.1016/j.compbiomed.2025.110181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 03/14/2025] [Accepted: 04/07/2025] [Indexed: 04/26/2025]
Abstract
BACKGROUND Phenotyping, the process of systematically identifying and classifying conditions within clinical data, is a crucial first step in any data science work involving Electronic Health Records (EHRs). Traditional approaches require extensive manual annotation efforts and face challenges with scalability. METHODS We investigated the use of Large Language Models (LLMs) for zero-shot phenotyping of 20 prevalent chronic conditions based on synthetic patient summaries generated from real structured EHRs codes. We evaluated the performance of multiple LLMs, including GPT-4o, GPT-3.5, and LLaMA 3 models with 8-billion, 70-billion, and 405-billion parameters, comparing them against traditional rule-based methods. For the analysis we used a dataset of 1,000 patients from Hospital da Luz Lisboa. RESULTS GPT-4o outperformed both traditional rule-based methods and alternative LLMs, achieving superior recall (0.97) and macro-F1 score (0.92). Rule-based phenotyping, while highly precise (0.92), showed lower recall (0.36). The integration of rule-based methods with LLMs optimized phenotyping accuracy by targeting manual annotation efforts on discordant cases. CONCLUSION Zero-shot learning with LLMs, particularly GPT-4o, offers a powerful and efficient approach for phenotyping chronic conditions from EHRs, significantly reducing the need for extensive labeled datasets while maintaining high accuracy and interpretability.
Collapse
Affiliation(s)
- Bernardo Neves
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal; Internal Medicine Department, Hospital da Luz Lisboa, Lisboa, Portugal; INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal; Católica Medical School, Universidade Católica Portuguesa, Portugal.
| | | | - Simão Gonçalves
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | - Jorge Cerejo
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | - Nuno A da Silva
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | - Francisca Leite
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal; Católica Medical School, Universidade Católica Portuguesa, Portugal
| | - Mário J Silva
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
| |
Collapse
|
3
|
Appah-Sampong A, Balaji A, Casey JH, Kotturu N, Montano D, Manchella M, Bacare B, Fitzgibbon JJ, Heindel P, Dey T, Bikdeli B, Hussain MA. A scoping review of electronic phenotyping methodologies used to identify peripheral artery disease in observational studies. Vasc Med 2025; 30:330-342. [PMID: 40340584 DOI: 10.1177/1358863x251328671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2025]
Abstract
Billing data including International Classification of Diseases (ICD) codes are increasingly used to identify cohorts of patients with peripheral artery disease (PAD) in electronic health records (EHRs) and administrative claims databases (ACDs). However, the validity of common PAD phenotyping approaches is a central challenge to the utilization of EHR and ACD data. We present a scoping review of contemporary PAD observational studies to describe the electronic phenotyping strategies employed in PAD identification and propose recommendations for improvement. We searched two databases, MEDLINE and Web of Science, identifying a total of 748 articles that underwent title and abstract review. Of these articles, 163 met the criteria for full-text review, with 84 articles ultimately included in the study. We demonstrate that 19.0% of eligible studies utilized ICD, Ninth Revision (ICD-9) codes, 11.9% utilized ICD, Tenth Revision (ICD-10) codes, and 69.0% of studies utilized a combination of ICD-9 and ICD-10 codes in their electronic phenotyping methodology. Of the included studies, 76.2% utilized a single-code query approach for electronic phenotyping despite low diagnostic yield, and 21.4% utilized rule-based methods. Only five studies utilized logistic regression modeling, despite the demonstrated effectiveness of this method. The current study demonstrates high utilization of unreliable electronic phenotyping methods such as single-code-based queries, which severely limits research quality. Improvements in electronic phenotyping methods are necessary to leverage data from EHRs and ACDs for high-quality research.
Collapse
Affiliation(s)
- Abena Appah-Sampong
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, MA, USA
| | - Ascharya Balaji
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Jack H Casey
- Royal College of Surgeons in Ireland School of Medicine, Dublin, Ireland
| | - Navya Kotturu
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Danielle Montano
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Mohit Manchella
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Bassil Bacare
- Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - James J Fitzgibbon
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, MA, USA
| | - Patrick Heindel
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, MA, USA
| | - Tanujit Dey
- Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, MA, USA
| | - Behnood Bikdeli
- Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Yale Center for Outcomes Research and Evaluation (CORE), New Haven, CT, USA
| | - Mohamad A Hussain
- Department of Surgery, Division of Vascular and Endovascular Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, MA, USA
| |
Collapse
|
4
|
Masison J, Lehmann HP, Wan J. Utilization of Computable Phenotypes in Electronic Health Record Research: A Review and Case Study in Atopic Dermatitis. J Invest Dermatol 2025; 145:1008-1016. [PMID: 39488781 PMCID: PMC12018156 DOI: 10.1016/j.jid.2024.08.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 08/05/2024] [Accepted: 08/18/2024] [Indexed: 11/04/2024]
Abstract
Querying electronic health records databases to accurately identify specific cohorts of patients has countless observational and interventional research applications. Computable phenotypes are computationally executable, explicit sets of selection criteria composed of data elements, logical expressions, and a combination of natural language processing and machine learning techniques enabling expedited patient cohort identification. Phenotyping encompasses a range of implementations, each with advantages and use cases. In this paper, the dermatologic computable phenotype literature is reviewed. We identify and evaluate approaches and community supports for computable phenotyping that have been used both generally and within dermatology and, as a case study, focus on studied phenotypes for atopic dermatitis.
Collapse
Affiliation(s)
- Joseph Masison
- University of Connecticut School of Medicine, Farmington, Connecticut, USA
| | - Harold P Lehmann
- Division of General Internal Medicine, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Joy Wan
- Department of Dermatology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.
| |
Collapse
|
5
|
Lewicki P, Benhalim Y, Bradin J, Dryden K, Hakim H, Heasman B, Taylor A, Aqeel J, Vejalla A, Conte M, Richesson R, Stensland K. Development and Evaluation of an Electronic Health Record-Derived Computable Phenotype to Identify Patients Undergoing Prostate Cancer Screening. JCO Clin Cancer Inform 2025; 9:e2400261. [PMID: 40279529 DOI: 10.1200/cci-24-00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 01/21/2025] [Accepted: 03/12/2025] [Indexed: 04/27/2025] Open
Abstract
PURPOSE Given challenges with randomized trials, tumor registries, and insurance claims, electronic health record data are an appealing resource for studying prostate-specific antigen (PSA) screening for prostate cancer. Transparent, well-evaluated computable phenotypes that observe a stringent definition of screening (v for-cause diagnosis- or symptom-directed testing) are critical for reproducibility and comparison with prospective cohorts. METHODS A cohort of patients who underwent PSA testing in a primary care setting at a large, tertiary health care system was identified. Gold-standard labels for screening versus not screening were created via a combination of clinical note text review and exclusionary diagnosis codes. Ten computable phenotype definitions were created by urology content experts and then evaluated for sensitivity, specificity, and positive predictive value (PPV) and negative predictive value against gold-standard labels. RESULTS Three hundred fifty-five patients with gold-standard labels were included in the final study cohort. Varying by how missing text data were classified (not applicable v screening), 149 (50.3%) and 208 (58.6%) patients underwent screening. No single phenotype optimized both sensitivity and PPV, although a composite definition that included either (1) absence of symptoms or (2) presence of an encounter for screening code achieved a very high PPV of 0.99 (95% CI, 0.96 to 1.00) with a reasonable sensitivity of 0.82 (95% CI, 0.75 to 0.88). CONCLUSION We identify code-based PSA screening phenotypes with a range of performance characteristics. Prevalence of for-cause diagnosis- and symptom-directed testing are significant and may contaminate cohorts not taking related codes into account.
Collapse
Affiliation(s)
- Patrick Lewicki
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Yasmin Benhalim
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Joshua Bradin
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Kim Dryden
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Husain Hakim
- Department of Urology, University of Michigan, Ann Arbor, MI
| | | | - Ana Taylor
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Jawad Aqeel
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Anuush Vejalla
- Department of Urology, University of Michigan, Ann Arbor, MI
| | - Marisa Conte
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI
| | - Rachel Richesson
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI
| | - Kristian Stensland
- Department of Urology, University of Michigan, Ann Arbor, MI
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI
| |
Collapse
|
6
|
Mankowski MA, Bae S, Strauss AT, Lonze BE, Orandi BJ, Stewart D, Massie AB, McAdams-DeMarco MA, Oermann EK, Habal M, Iturrate E, Gentry SE, Segev DL, Axelrod D. Generalizability of kidney transplant data in electronic health records - The Epic Cosmos database vs the Scientific Registry of Transplant Recipients. Am J Transplant 2025; 25:744-755. [PMID: 39550008 PMCID: PMC11972892 DOI: 10.1016/j.ajt.2024.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/26/2024] [Accepted: 11/06/2024] [Indexed: 11/18/2024]
Abstract
Developing real-world evidence from electronic health records (EHR) is vital to advancing kidney transplantation (KT). We assessed the feasibility of studying KT using the Epic Cosmos aggregated EHR data set, which includes 274 million unique individuals cared for in 238 US health systems, by comparing it with the Scientific Registry of Transplant Recipients (SRTR). We identified 69 418 KT recipients who underwent transplants between January 2014 and December 2022 in Cosmos (39.4% of all US KT transplants during this period). The demographics and clinical characteristics of recipients captured in Cosmos were consistent with the overall SRTR cohort. Survival estimates were generally comparable, although there were some differences in long-term survival. At 7 years posttransplant, patient survival was 80.4% in Cosmos and 77.8% in SRTR. Multivariable Cox regression showed consistent associations between clinical factors and mortality in both cohorts, with minor discrepancies in the associations between death and both age and race. In summary, Cosmos provides a reliable platform for KT research, allowing EHR-level clinical granularity not available with either the transplant registry or health care claims. Consequently, Cosmos will enable novel analyses to improve our understanding of KT management on a national scale.
Collapse
Affiliation(s)
- Michal A Mankowski
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA.
| | - Sunjae Bae
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA; Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Alexandra T Strauss
- Department of Medicine, Johns Hopkins University, School of Medicine, Baltimore, Maryland, USA
| | - Bonnie E Lonze
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA
| | - Babak J Orandi
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA; Center for Data Science, New York University, New York, New York, USA
| | - Darren Stewart
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA
| | - Allan B Massie
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA; Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Mara A McAdams-DeMarco
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA; Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Eric K Oermann
- Center for Data Science, New York University, New York, New York, USA; Department of Neurosurgery, NYU Grossman School of Medicine, New York, New York, USA; Department of Radiology, NYU Langone Health, New York, New York, USA; Neuroscience Institute, NYU Langone Health, New York, New York, USA
| | - Marlena Habal
- Department of Medicine, NYU Grossman School of Medicine, New York, New York, USA
| | - Eduardo Iturrate
- Department of Medicine, NYU Grossman School of Medicine, New York, New York, USA
| | - Sommer E Gentry
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA; Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Dorry L Segev
- Department of Surgery, NYU Grossman School of Medicine, New York, New York, USA; Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - David Axelrod
- Department of Surgery, University Hospitals, Cleveland, Ohio, USA
| |
Collapse
|
7
|
Al-Sultani Z, Inglis TJ, McFadden B, Thomas E, Reynolds M. Sepsis in silico: definition, development and application of an electronic phenotype for sepsis. J Med Microbiol 2025; 74. [PMID: 40153307 DOI: 10.1099/jmm.0.001986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2025] Open
Abstract
Repurposing electronic health record (EHR) or electronic medical record (EMR) data holds significant promise for evidence-based epidemic intelligence and research. Key challenges include sepsis recognition by physicians and issues with EHR and EMR data. Recent advances in data-driven techniques, alongside initiatives like the Surviving Sepsis Campaign and the Severe Sepsis and Septic Shock Management Bundle (SEP-1), have improved sepsis definition, early detection, subtype characterization, prognostication and personalized treatment. This includes identifying potential biomarkers or digital signatures to enhance diagnosis, guide therapy and optimize clinical management. Machine learning applications play a crucial role in identifying biomarkers and digital signatures associated with sepsis and its sub-phenotypes. Additionally, electronic phenotyping, leveraging EHR and EMR data, has emerged as a valuable tool for evidence-based sepsis identification and management. This review examines methods for identifying sepsis cohorts, focusing on two main approaches: utilizing health administrative data with standardized diagnostic coding via the International Classification of Diseases and integrating clinical data. This overview provides a comprehensive analysis of current cohort identification and electronic phenotyping strategies for sepsis, highlighting their potential applications and challenges. The accuracy of an electronic phenotype or signature is pivotal for precision medicine, enabling a shift from subjective clinical descriptions to data-driven insights.
Collapse
Affiliation(s)
- Zahraa Al-Sultani
- School of Physics, Maths and Computing, Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, Australia
| | - Timothy Jj Inglis
- Division of Pathology and Laboratory Medicine, School of Medicine, University of Western Australia, Crawley, WA 6009, Australia
- PathWest Laboratory Medicine WA, QEII Medical Centre, Nedlands, WA 6009, Australia
| | - Benjamin McFadden
- School of Physics, Maths and Computing, Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, Australia
| | - Elizabeth Thomas
- Curtin School of Population Health, Curtin University, Bentley, WA 6845, Australia
| | - Mark Reynolds
- School of Physics, Maths and Computing, Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, Australia
| |
Collapse
|
8
|
Afrifa‐Yamoah E, Adua E, Peprah‐Yamoah E, Anto EO, Opoku‐Yamoah V, Acheampong E, Macartney MJ, Hashmi R. Pathways to chronic disease detection and prediction: Mapping the potential of machine learning to the pathophysiological processes while navigating ethical challenges. Chronic Dis Transl Med 2025; 11:1-21. [PMID: 40051825 PMCID: PMC11880127 DOI: 10.1002/cdt3.137] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/03/2024] [Accepted: 05/27/2024] [Indexed: 03/09/2025] Open
Abstract
Chronic diseases such as heart disease, cancer, and diabetes are leading drivers of mortality worldwide, underscoring the need for improved efforts around early detection and prediction. The pathophysiology and management of chronic diseases have benefitted from emerging fields in molecular biology like genomics, transcriptomics, proteomics, glycomics, and lipidomics. The complex biomarker and mechanistic data from these "omics" studies present analytical and interpretive challenges, especially for traditional statistical methods. Machine learning (ML) techniques offer considerable promise in unlocking new pathways for data-driven chronic disease risk assessment and prognosis. This review provides a comprehensive overview of state-of-the-art applications of ML algorithms for chronic disease detection and prediction across datasets, including medical imaging, genomics, wearables, and electronic health records. Specifically, we review and synthesize key studies leveraging major ML approaches ranging from traditional techniques such as logistic regression and random forests to modern deep learning neural network architectures. We consolidate existing literature to date around ML for chronic disease prediction to synthesize major trends and trajectories that may inform both future research and clinical translation efforts in this growing field. While highlighting the critical innovations and successes emerging in this space, we identify the key challenges and limitations that remain to be addressed. Finally, we discuss pathways forward toward scalable, equitable, and clinically implementable ML solutions for transforming chronic disease screening and prevention.
Collapse
Affiliation(s)
| | - Eric Adua
- Rural Clinical School, Medicine and HealthUniversity of New South WalesSydneyNew South WalesAustralia
- School of Medical and Health SciencesEdith Cowan UniversityJoondalupWestern AustraliaAustralia
| | | | - Enoch O. Anto
- School of Medical and Health SciencesEdith Cowan UniversityJoondalupWestern AustraliaAustralia
- Department of Medical Diagnostics, Faculty of Allied Health Sciences, College of Health SciencesKwame Nkrumah University of Science and TechnologyKumasiGhana
| | - Victor Opoku‐Yamoah
- School of Optometry and Vision ScienceUniversity of WaterlooWaterlooOntarioCanada
| | - Emmanuel Acheampong
- Department of Genetics and Genome BiologyLeicester Cancer Research CentreUniversity of LeicesterLeicesterUK
| | - Michael J. Macartney
- Faculty of Science Medicine and HealthUniversity of WollongongWollongongNew South WalesAustralia
| | - Rashid Hashmi
- Rural Clinical School, Medicine and HealthUniversity of New South WalesSydneyNew South WalesAustralia
| |
Collapse
|
9
|
Chen J, Li XN, Lu CC, Yuan S, Yung G, Ye J, Tian H, Lin J. Considerations for master protocols using external controls. J Biopharm Stat 2025; 35:297-319. [PMID: 38363805 DOI: 10.1080/10543406.2024.2311248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 01/24/2024] [Indexed: 02/18/2024]
Abstract
There has been an increasing use of master protocols in oncology clinical trials because of its efficiency to accelerate cancer drug development and flexibility to accommodate multiple substudies. Depending on the study objective and design, a master protocol trial can be a basket trial, an umbrella trial, a platform trial, or any other form of trials in which multiple investigational products and/or subpopulations are studied under a single protocol. Master protocols can use external data and evidence (e.g. external controls) for treatment effect estimation, which can further improve efficiency of master protocol trials. This paper provides an overview of different types of external controls and their unique features when used in master protocols. Some key considerations in master protocols with external controls are discussed including construction of estimands, assessment of fit-for-use real-world data, and considerations for different types of master protocols. Similarities and differences between regular randomized controlled trials and master protocols when using external controls are discussed. A targeted learning-based causal roadmap is presented which constitutes three key steps: (1) define a target statistical estimand that aligns with the causal estimand for the study objective, (2) use an efficient estimator to estimate the target statistical estimand and its uncertainty, and (3) evaluate the impact of causal assumptions on the study conclusion by performing sensitivity analyses. Two illustrative examples for master protocols using external controls are discussed for their merits and possible improvement in causal effect estimation.
Collapse
Affiliation(s)
- Jie Chen
- Data Sciences, ECR Global, Shanghai, China
| | | | | | - Sammy Yuan
- Oncology Statistics, GlaxoSmithKline, Collegeville, Pennsylvania, USA
| | - Godwin Yung
- Product Development Data and Statistical Sciences, Genentech/Roche, South San Francisco, Cambridge, USA
| | - Jingjing Ye
- Global Statistics and Data Sciences, BeiGene, Fulton, Maryland, USA
| | - Hong Tian
- Global Statistics, BeiGene, Ridgefield Park, New Jersy, USA
| | - Jianchang Lin
- Statistical & Quantitative Sciences, Takeda, Cambridge, Massachusetts, USA
| |
Collapse
|
10
|
Liu M, Deng K, Wang M, He Q, Xu J, Li G, Zou K, Sun X, Wang W. Methods for identifying health status from routinely collected health data: An overview. Integr Med Res 2025; 14:101100. [PMID: 39897572 PMCID: PMC11786076 DOI: 10.1016/j.imr.2024.101100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 11/01/2024] [Accepted: 11/13/2024] [Indexed: 02/04/2025] Open
Abstract
Routinely collected health data (RCD) are currently accelerating publications that evaluate the effectiveness and safety of medicines and medical devices. One of the fundamental steps in using these data is developing algorithms to identify health status that can be used for observational studies. However, the process and methodologies for identifying health status from RCD remain insufficiently understood. While most current methods rely on International Classification of Diseases (ICD) codes, they may not be universally applicable. Although machine learning methods hold promise for more accurately identifying the health status, they remain underutilized in RCD studies. To address these significant methodological gaps, we outline key steps and methodological considerations for identifying health statuses in observational studies using RCD. This review has the potential to boost the credibility of findings from observational studies that use RCD.
Collapse
Affiliation(s)
- Mei Liu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Ke Deng
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Mingqi Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Qiao He
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Jiayue Xu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Guowei Li
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangzhou, Guangdong, China
- Biostatistics Unit, Research Institute at St. Joseph's Healthcare Hamilton, Hamilton, ON, Canada
| | - Kang Zou
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Xin Sun
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Wen Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| |
Collapse
|
11
|
Takeuchi T, Horinouchi H, Takasawa K, Mukai M, Masuda K, Shinno Y, Okuma Y, Yoshida T, Goto Y, Yamamoto N, Ohe Y, Miyake M, Watanabe H, Kusumoto M, Aoki T, Nishimura K, Hamamoto R. A series of natural language processing for predicting tumor response evaluation and survival curve from electronic health records. BMC Med Inform Decis Mak 2025; 25:85. [PMID: 39962486 PMCID: PMC11834625 DOI: 10.1186/s12911-025-02928-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 02/11/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND The clinical information housed within unstructured electronic health records (EHRs) has the potential to promote cancer research. The National Cancer Center Hospital (NCCH) is widely recognized as a leading institution for the treatment of thoracic malignancies in Japan. Information on medical treatment, particularly the characteristics of malignant tumors that occur in patients, tumor response evaluation, and adverse events, was compiled into the databases of each NCCH department from EHRs. However, there have been few opportunities for integrated analysis of data on both the hospital and research institute. METHODS We developed a method for predicting tumor response evaluation and survival curves of drug therapy from the EHRs of lung cancer patients using natural language processing. First, we developed a rule-based algorithm to predict treatment duration using a dictionary of anticancer drugs and regimens used for lung cancer treatment. Thereafter, we applied supervised learning to radiology reports during each treatment period and constructed a classification model to predict the tumor response evaluation of anticancer drugs and date when the progressive disease (PD) was determined. The predicted response and PD date can be used to draw a survival curve for the progression-free survival. RESULTS We used the EHRs of 716 lung cancer treatments at the NCCH and structured data of the cases as labels for the training and testing of supervised learning. The structured data were manually curated by physicians and CRCs. We investigated the results and performance of the proposed method. Individual predictions of tumor response evaluation and PD date were not extremely high. However, the final predicted survival curves were nearly similar to the actual survival curves. CONCLUSIONS Although it is difficult to construct a fully automated system using our method, we believe that it achieves sufficient performance for supporting physicians and CRCs constructing the database and providing clinical information to help researchers find out a chance of clinical studies.
Collapse
Affiliation(s)
- Toshiki Takeuchi
- , Xcoo, Inc. Hongo-Sanchome TH Bldg., 6F, 2-40-8, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
| | - Hidehito Horinouchi
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Ken Takasawa
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Masami Mukai
- Division of Medical Informatics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Ken Masuda
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Yuki Shinno
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Yusuke Okuma
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Tatsuya Yoshida
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Yasushi Goto
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Noboru Yamamoto
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Yuichiro Ohe
- Department of Thoracic Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Mototaka Miyake
- Department of Diagnostic Radiology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Hirokazu Watanabe
- Department of Diagnostic Radiology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Masahiko Kusumoto
- Department of Diagnostic Radiology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Takashi Aoki
- , Xcoo, Inc. Hongo-Sanchome TH Bldg., 6F, 2-40-8, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Kunihiro Nishimura
- , Xcoo, Inc. Hongo-Sanchome TH Bldg., 6F, 2-40-8, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ryuji Hamamoto
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| |
Collapse
|
12
|
Gregory ME, Kasthurirathne SN, Magoc T, McNamee C, Harle CA, Vest JR. Development and validation of computable social phenotypes for health-related social needs. JAMIA Open 2025; 8:ooae150. [PMID: 39776620 PMCID: PMC11706536 DOI: 10.1093/jamiaopen/ooae150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 09/09/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025] Open
Abstract
Objective Measurement of health-related social needs (HRSNs) is complex. We sought to develop and validate computable phenotypes (CPs) using structured electronic health record (EHR) data for food insecurity, housing instability, financial insecurity, transportation barriers, and a composite-type measure of these, using human-defined rule-based and machine learning (ML) classifier approaches. Materials and Methods We collected HRSN surveys as the reference standard and obtained EHR data from 1550 patients in 3 health systems from 2 states. We followed a Delphi-like approach to develop the human-defined rule-based CP. For the ML classifier approach, we trained supervised ML (XGBoost) models using 78 features. Using surveys as the reference standard, we calculated sensitivity, specificity, positive predictive values, and area under the curve (AUC). We compared AUCs using the Delong test and other performance measures using McNemar's test, and checked for differential performance. Results Most patients (63%) reported at least one HRSN on the reference standard survey. Human-defined rule-based CPs exhibited poor performance (AUCs=.52 to .68). ML classifier CPs performed significantly better, but still poor-to-fair (AUCs = .68 to .75). Significant differences for race/ethnicity were found for ML classifier CPs (higher AUCs for White non-Hispanic patients). Important features included number of encounters and Medicaid insurance. Discussion Using a supervised ML classifier approach, HRSN CPs approached thresholds of fair performance, but exhibited differential performance by race/ethnicity. Conclusion CPs may help to identify patients who may benefit from additional social needs screening. Future work should explore the use of area-level features via geospatial data and natural language processing to improve model performance.
Collapse
Affiliation(s)
- Megan E Gregory
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | | | - Tanja Magoc
- Quality and Patient Safety, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Cassidy McNamee
- Department of Health Policy & Management, Indiana University Richard M. Fairbanks School of Public Health—Indianapolis, Indianapolis, IN 46202, United States
| | - Christopher A Harle
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN 46202, United States
- Department of Health Policy & Management, Indiana University Richard M. Fairbanks School of Public Health—Indianapolis, Indianapolis, IN 46202, United States
| | - Joshua R Vest
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN 46202, United States
- Department of Health Policy & Management, Indiana University Richard M. Fairbanks School of Public Health—Indianapolis, Indianapolis, IN 46202, United States
| |
Collapse
|
13
|
Shah NH, Jain SS. From Better Models to Better Care. JACC. HEART FAILURE 2025; 13:88-90. [PMID: 39779184 DOI: 10.1016/j.jchf.2024.09.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 09/23/2024] [Indexed: 01/11/2025]
Affiliation(s)
- Nigam H Shah
- Department of Medicine, Stanford School of Medicine, Stanford, California, USA; Clinical Excellence Research Center, Stanford School of Medicine, Stanford, California, USA; Technology and Digital Solutions, Stanford Health Care, Palo Alto, California, USA.
| | - Sneha S Jain
- Department of Medicine, Stanford School of Medicine, Stanford, California, USA
| |
Collapse
|
14
|
Josephson CB, Aronica E, Beniczky S, Boyce D, Cavalleri G, Denaxas S, French J, Jehi L, Koh H, Kwan P, McDonald C, Mitchell JW, Rampp S, Sadleir L, Sisodiya SM, Wang I, Wiebe S, Yasuda C, Youngerman B, the ILAE Big Data Commission. Big data research is everyone's research-Making epilepsy data science accessible to the global community: Report of the ILAE big data commission. Epileptic Disord 2024; 26:733-752. [PMID: 39446076 PMCID: PMC11651381 DOI: 10.1002/epd2.20288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 07/24/2024] [Accepted: 09/04/2024] [Indexed: 10/25/2024]
Abstract
Epilepsy care generates multiple sources of high-dimensional data, including clinical, imaging, electroencephalographic, genomic, and neuropsychological information, that are collected routinely to establish the diagnosis and guide management. Thanks to high-performance computing, sophisticated graphics processing units, and advanced analytics, we are now on the cusp of being able to use these data to significantly improve individualized care for people with epilepsy. Despite this, many clinicians, health care providers, and people with epilepsy are apprehensive about implementing Big Data and accompanying technologies such as artificial intelligence (AI). Practical, ethical, privacy, and climate issues represent real and enduring concerns that have yet to be completely resolved. Similarly, Big Data and AI-related biases have the potential to exacerbate local and global disparities. These are highly germane concerns to the field of epilepsy, given its high burden in developing nations and areas of socioeconomic deprivation. This educational paper from the International League Against Epilepsy's (ILAE) Big Data Commission aims to help clinicians caring for people with epilepsy become familiar with how Big Data is collected and processed, how they are applied to studies using AI, and outline the immense potential positive impact Big Data can have on diagnosis and management.
Collapse
Affiliation(s)
- Colin B. Josephson
- Department of Clinical Neurosciences, Cumming School of MedicineUniversity of CalgaryCalgaryAlbertaCanada
- Hotchkiss Brain InstituteUniversity of CalgaryCalgaryAlbertaCanada
- Department of Community Health Sciences, Cumming School of MedicineUniversity of CalgaryAlbertaCanada
- O'Brien Institute for Public HealthUniversity of CalgaryCalgaryAlbertaCanada
- Centre for Health InformaticsUniversity of CalgaryCalgaryAlbertaCanada
- Institute for Health InformaticsUniversity College LondonLondonUK
| | - Eleonora Aronica
- Department of (Neuro)Pathology, Amsterdam UMCUniversity of Amsterdam, Amsterdam NeuroscienceAmsterdamThe Netherlands
- Stichting Epilepsie Instellingen Nederland (SEIN)HeemstedeThe Netherlands
| | - Sandor Beniczky
- Department of Neurology, Albert Szent‐Györgyi Medical SchoolUniversity of SzegedSzegedHungary
- Department of NeurophysiologyDanish Epilepsy CenterDianalundDenmark
- Department of Clinical Medicine, Aarhus University and Department of Clinical NeurophysiologyAarhus University HospitalAarhusDenmark
| | - Danielle Boyce
- Tufts University School of MedicineBostonMassachusettsUSA
- Johns Hopkins University Biomedical Informatics and Data Science SectionBaltimoreMarylandUSA
- West Chester University Department of Public Policy and AdministrationWest ChesterPennsylvaniaUSA
| | - Gianpiero Cavalleri
- School of Pharmacy and Biomolecular SciencesThe Royal College of Surgeons in IrelandDublinIreland
- FutureNeuro SFI Research CentreThe Royal College of Surgeons in IrelandDublinIreland
| | - Spiros Denaxas
- Institute for Health InformaticsUniversity College LondonLondonUK
- British Heart Foundation Data Science CenterHealth Data Research UKLondonUK
| | - Jacqueline French
- Department of NeurologyGrossman School of Medicine, New York UniversityNew YorkNew YorkUSA
| | - Lara Jehi
- Epilepsy CenterCleveland ClinicClevelandOhioUSA
- Center for Computational Life SciencesClevelandOhioUSA
| | - Hyunyong Koh
- Harvard Brain Science InitiativeHarvard UniversityBostonMassachusettsUSA
| | - Patrick Kwan
- Department of Neuroscience, School of Translational MedicineMonash UniversityMelbourneVictoriaAustralia
- Department of NeurologyAlfred HealthMelbourneVictoriaAustralia
- Department of NeurologyThe Royal Melbourne HospitalParkvilleVictoriaAustralia
| | - Carrie McDonald
- Department of Radiation Medicine and Applied Sciences & PsychiatryUniversity of CaliforniaSan DiegoCaliforniaUSA
| | - James W. Mitchell
- Institute of Systems, Molecular and Integrative Biology (ISMIB)University of LiverpoolLiverpoolUK
- Department of NeurologyThe Walton Cetnre NHS Foundation TrustLiverpoolUK
| | - Stefan Rampp
- Department of Neurosurgery and Department of Neuroradiology, University Hospital Erlangen, Department of NeurosurgeryUniversity Hospital Halle (Saale)Halle (Saale)Germany
| | - Lynette Sadleir
- Department of Paediatrics and Child HealthUniversity of OtagoWellingtonNew Zealand
| | - Sanjay M. Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of NeurologyLondon WC1N 3BG and Chalfont Centre for EpilepsyLondonUK
| | - Irene Wang
- Epilepsy Center, Neurological InstituteCleveland ClinicClevelandOhioUSA
| | - Samuel Wiebe
- Department of Clinical Neurosciences, Cumming School of MedicineUniversity of CalgaryCalgaryAlbertaCanada
- Hotchkiss Brain InstituteUniversity of CalgaryCalgaryAlbertaCanada
- Department of Community Health Sciences, Cumming School of MedicineUniversity of CalgaryAlbertaCanada
- O'Brien Institute for Public HealthUniversity of CalgaryCalgaryAlbertaCanada
- Clinical Research Unit, Cumming School of MedicineUniversity of CalgaryCalgaryAlbertaCanada
| | | | - Brett Youngerman
- Department of Neurological SurgeryColumbia University Vagelos College of Physicians and SurgeonsNew YorkNew YorkUSA
| | | |
Collapse
|
15
|
Neuraz A, Vaillant G, Arias C, Birot O, Huynh KT, Fabacher T, Rogier A, Garcelon N, Lerner I, Rance B, Coulet A. Facilitating phenotyping from clinical texts: the medkit library. Bioinformatics 2024; 40:btae681. [PMID: 39546377 PMCID: PMC11645105 DOI: 10.1093/bioinformatics/btae681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/16/2024] [Accepted: 11/12/2024] [Indexed: 11/17/2024] Open
Abstract
SUMMARY Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies. To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment. AVAILABILITY AND IMPLEMENTATION medkit is available at https://github.com/medkit-lib/medkit.
Collapse
Affiliation(s)
- Antoine Neuraz
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
- Hôpital Necker, Assistance Publique—Hôpitaux de Paris, Paris 75015, France
| | - Ghislain Vaillant
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
| | - Camila Arias
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
| | - Olivier Birot
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
| | - Kim-Tam Huynh
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
| | - Thibaut Fabacher
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
- University Hospital of Strasbourg, Strasbourg 67000, France
| | - Alice Rogier
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
- Hôpital Européen Georges Pompidou, Assistance Publique—Hôpitaux de Paris, Paris 75015, France
| | - Nicolas Garcelon
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
- Imagine Institute, Inserm UMR 1163, Université Paris Cité, Paris 75015, France
| | - Ivan Lerner
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
- Hôpital Européen Georges Pompidou, Assistance Publique—Hôpitaux de Paris, Paris 75015, France
| | - Bastien Rance
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
- Hôpital Européen Georges Pompidou, Assistance Publique—Hôpitaux de Paris, Paris 75015, France
| | - Adrien Coulet
- Inria Paris, Paris 75013, France
- Centre de Recherche des Cordeliers, Inserm UMR 1138, Université Paris Cité, Sorbonne Université, Paris 75006, France
| |
Collapse
|
16
|
Tekumalla R, Banda JM. Towards automated phenotype definition extraction using large language models. Genomics Inform 2024; 22:21. [PMID: 39482749 PMCID: PMC11529293 DOI: 10.1186/s44342-024-00023-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 09/29/2024] [Indexed: 11/03/2024] Open
Abstract
Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data ("hallucinations"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.
Collapse
Affiliation(s)
| | - Juan M Banda
- Stanford Health Care, Stanford, CA, USA.
- Observational Health Data Sciences and Informatics, New York, NY, USA.
| |
Collapse
|
17
|
Herr K, Lu P, Diamreyan K, Xu H, Mendonca E, Weaver KN, Chen J. Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital. HGG ADVANCES 2024; 5:100341. [PMID: 39148290 PMCID: PMC11401171 DOI: 10.1016/j.xhgg.2024.100341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/09/2024] [Accepted: 08/09/2024] [Indexed: 08/17/2024] Open
Abstract
Rare genetic diseases (RGDs) affect a significant number of individuals, particularly in pediatric populations. This study investigates the efficacy of identifying RGD diagnoses through electronic health records (EHRs) and natural language processing (NLP) tools, and analyzes the prevalence of identified RGDs for potential underdiagnosis at Cincinnati Children's Hospital Medical Center (CCHMC). EHR data from 659,139 pediatric patients at CCHMC were utilized. Diagnoses corresponding to RGDs in Orphanet were identified using rule-based and machine learning-based NLP methods. Manual evaluation assessed the precision of the NLP strategies, with 100 diagnosis descriptions reviewed for each method. The rule-based method achieved a precision of 97.5% (95% CI: 91.5%, 99.4%), while the machine-learning-based method had a precision of 73.5% (95% CI: 63.6%, 81.6%). A manual chart review of 70 randomly selected patients with RGD diagnoses confirmed the diagnoses in 90.3% (95% CI: 82.0%, 95.2%) of cases. A total of 37,326 pediatric patients were identified with 977 RGD diagnoses based on the rule-based method, resulting in a prevalence of 5.66% in this population. While a majority of the disorders showed a higher prevalence at CCHMC compared with Orphanet, some diseases, such as 1p36 deletion syndrome, indicated potential underdiagnosis. Analyses further uncovered disparities in RGD prevalence and age of diagnosis across gender and racial groups. This study demonstrates the utility of employing EHR data with NLP tools to systematically investigate RGD diagnoses in large cohorts. The identified disparities underscore the need for enhanced approaches to guarantee timely and accurate diagnosis and management of pediatric RGDs.
Collapse
Affiliation(s)
- Kate Herr
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Peixin Lu
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Kessi Diamreyan
- University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Huan Xu
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Eneida Mendonca
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - K Nicole Weaver
- University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Heart Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Jing Chen
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.
| |
Collapse
|
18
|
Conderino S, Anthopolos R, Albrecht SS, Farley SM, Divers J, Titus AR, Thorpe LE. Addressing Information Biases Within Electronic Health Record Data to Improve the Examination of Epidemiologic Associations With Diabetes Prevalence Among Young Adults: Cross-Sectional Study. JMIR Med Inform 2024; 12:e58085. [PMID: 39353204 PMCID: PMC11460830 DOI: 10.2196/58085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/10/2024] [Accepted: 07/10/2024] [Indexed: 10/04/2024] Open
Abstract
Background Electronic health records (EHRs) are increasingly used for epidemiologic research to advance public health practice. However, key variables are susceptible to missing data or misclassification within EHRs, including demographic information or disease status, which could affect the estimation of disease prevalence or risk factor associations. Objective In this paper, we applied methods from the literature on missing data and causal inference to assess whether we could mitigate information biases when estimating measures of association between potential risk factors and diabetes among a patient population of New York City young adults. Methods We estimated the odds ratio (OR) for diabetes by race or ethnicity and asthma status using EHR data from NYU Langone Health. Methods from the missing data and causal inference literature were then applied to assess the ability to control for misclassification of health outcomes in the EHR data. We compared EHR-based associations with associations observed from 2 national health surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health and Nutrition Examination Survey, representing traditional public health surveillance systems. Results Observed EHR-based associations between race or ethnicity and diabetes were comparable to health survey-based estimates, but the association between asthma and diabetes was significantly overestimated (OREHR 3.01, 95% CI 2.86-3.18 vs ORBRFSS 1.23, 95% CI 1.09-1.40). Missing data and causal inference methods reduced information biases in these estimates, yielding relative differences from traditional estimates below 50% (ORMissingData 1.79, 95% CI 1.67-1.92 and ORCausal 1.42, 95% CI 1.34-1.51). Conclusions Findings suggest that without bias adjustment, EHR analyses may yield biased measures of association, driven in part by subgroup differences in health care use. However, applying missing data or causal inference frameworks can help control for and, importantly, characterize residual information biases in these estimates.
Collapse
Affiliation(s)
- Sarah Conderino
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Rebecca Anthopolos
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Sandra S Albrecht
- Department of Epidemiology, Mailman School of Public Health at Columbia University, New York, NY, United States
| | | | - Jasmin Divers
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
- Department of Foundations of Medicine, New York University Long Island School of Medicine, Mineola, NY, United States
| | - Andrea R Titus
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Lorna E Thorpe
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| |
Collapse
|
19
|
Xian S, Grabowska ME, Kullo IJ, Luo Y, Smoller JW, Wei WQ, Jarvik G, Mooney S, Crosslin D. Language-model-based patient embedding using electronic health records facilitates phenotyping, disease forecasting, and progression analysis. RESEARCH SQUARE 2024:rs.3.rs-4708839. [PMID: 39399661 PMCID: PMC11469380 DOI: 10.21203/rs.3.rs-4708839/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. Though significant efforts have been devoted to investigating the application of machine learning algorithms in the EHR, efficient and powerful representation of patients is needed to unleash the potential of discovering new medical patterns underlying the EHR. Here, we present an unsupervised method for embedding high-dimensional EHR data at the patient level, aimed at characterizing patient heterogeneity in complex diseases and identifying new disease patterns associated with clinical outcome disparities. Inspired by the architecture of modern language models-specifically transformers with attention mechanisms, we use patient diagnosis and procedure codes as vocabularies and treat each patient as a sentence to perform the patient embedding. We applied this approach to 34,851 unique medical codes across 1,046,649 longitudinal patient events, including 102,739 patients from the electronic Medical Records and GEnomics (eMERGE) Network. The resulting patient vectors demonstrated excellent performance in predicting future disease events (median AUROC = 0.87 within one year) and bulk phenotyping (median AUROC = 0.84). We then illustrated the utility of these patient vectors in revealing heterogeneous comorbidity patterns, exemplified by disease subtypes in colorectal cancer and systemic lupus erythematosus, and capturing distinct longitudinal disease trajectories. External validation using EHR data from the University of Washington confirmed robust model performance, with median AUROCs of 0.83 and 0.84 for bulk phenotyping tasks and disease onset prediction, respectively. Importantly, the model reproduced the clustering results of disease subtypes identified in the eMERGE cohort and uncovered variations in overall mortality among these subtypes. Together, these results underscore the potential of representation learning in EHRs to enhance patient characterization and associated clinical outcomes, thereby advancing disease forecasting and facilitating personalized medicine.
Collapse
Affiliation(s)
- Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic Rochester Minnesota
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Gail Jarvik
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Sean Mooney
- Center for Information Technology, National Institutes of Health
| | - David Crosslin
- Department of Medicine, Division of Biomedical Informatics and Genomics, Tulane University, New Orleans, LA
| |
Collapse
|
20
|
Dai HJ, Wang CK, Chen CC, Liou CS, Lu AT, Lai CH, Shain BT, Ke CR, Wang WYC, Mir TH, Simanjuntak M, Kao HY, Tsai MJ, Tseng VS. Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study. J Med Internet Res 2024; 26:e58278. [PMID: 39302714 PMCID: PMC11452756 DOI: 10.2196/58278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/28/2024] [Accepted: 07/17/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone to errors. With the transition from the International Classification of Diseases, Ninth Revision, to the International Classification of Diseases, Tenth Revision (ICD-10), the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Inaccurate coding can result in substantial financial losses for hospitals, and a precise assessment of outcomes generated by a natural language processing (NLP)-driven autocoding system thus assumes a critical role in safeguarding the accuracy of the Taiwan diagnosis related groups (Tw-DRGs). OBJECTIVE This study aims to evaluate the feasibility of applying an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM), autocoding system that can automatically determine diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically principal diagnosis and major diagnostic categories (MDCs). METHODS By using the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference data set we developed artificial intelligence (AI)-assisted ICD-10-CM coding systems based on deep learning models. We constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs were manually curated by a CCS with the principal diagnosis and MDC was determined from discharge summaries collected at KMUCHH from February 2023 to April 2023. RESULTS Both the reference data set and real hospital data were used to assess performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all methods, the GPT-2 (OpenAI)-based model achieved the highest F1-score, 0.667 (F1-score 0.851 for the top 50 codes), on the KMUCHH test set and a slightly lower F1-score, 0.621, in real hospital data. Cohen κ evaluation for the agreement of MDC between the models and the CCS revealed that the overall average κ value for GPT-2 (κ=0.714) was approximately 12.2 percentage points higher than that of the hierarchy attention network (κ=0.592). GPT-2 demonstrated superior agreement with the CCS across 6 categories of MDC, with an average κ value of approximately 0.869 (SD 0.033), underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs. CONCLUSIONS An NLP-driven AI-assisted coding system can assist CCSs in ICD-10-CM coding by offering coding references via a user interface, demonstrating the potential to reduce the manual workload and expedite Tw-DRG assessment. Consistency in performance affirmed the effectiveness of the system in supporting CCSs in ICD-10-CM coding and the judgment of Tw-DRGs.
Collapse
Affiliation(s)
- Hong-Jie Dai
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
- National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
- Center for Big Data Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Chen-Kai Wang
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Advanced Technology Laboratory, Chunghwa Telecom Laboratories, Taoyuan, Taiwan
| | - Chien-Chang Chen
- Electromagnetic Sensing Control and AI Computing System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Chong-Sin Liou
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - An-Tai Lu
- School of Post-Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Chia-Hsin Lai
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Bo-Tsz Shain
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Cheng-Rong Ke
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | | | - Tatheer Hussain Mir
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Mutiara Simanjuntak
- Intelligent System Lab, College of Electrical Engineering and Computer Science, Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Hao-Yun Kao
- Department of Healthcare Administration and Medical Informatics, College of Health Sciences, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Ming-Ju Tsai
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Vincent S Tseng
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
21
|
Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large language models facilitate the generation of electronic health record phenotyping algorithms. J Am Med Inform Assoc 2024; 31:1994-2001. [PMID: 38613820 PMCID: PMC11339509 DOI: 10.1093/jamia/ocae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/21/2024] [Accepted: 03/22/2024] [Indexed: 04/15/2024] Open
Abstract
OBJECTIVES Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. MATERIALS AND METHODS We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. RESULTS GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). CONCLUSION GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
Collapse
Affiliation(s)
- Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Matthew S Krantz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Wu-Chen Su
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Alyson L Dickson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - C Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - V Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States
| |
Collapse
|
22
|
Wang W, Jin YH, Liu M, He Q, Xu JY, Wang MQ, Li GW, Fu B, Yan SY, Zou K, Sun X. Guidance of development, validation, and evaluation of algorithms for populating health status in observational studies of routinely collected data (DEVELOP-RCD). Mil Med Res 2024; 11:52. [PMID: 39107834 PMCID: PMC11302358 DOI: 10.1186/s40779-024-00559-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 07/24/2024] [Indexed: 03/17/2025] Open
Abstract
BACKGROUND In recent years, there has been a growing trend in the utilization of observational studies that make use of routinely collected healthcare data (RCD). These studies rely on algorithms to identify specific health conditions (e.g. diabetes or sepsis) for statistical analyses. However, there has been substantial variation in the algorithm development and validation, leading to frequently suboptimal performance and posing a significant threat to the validity of study findings. Unfortunately, these issues are often overlooked. METHODS We systematically developed guidance for the development, validation, and evaluation of algorithms designed to identify health status (DEVELOP-RCD). Our initial efforts involved conducting both a narrative review and a systematic review of published studies on the concepts and methodological issues related to algorithm development, validation, and evaluation. Subsequently, we conducted an empirical study on an algorithm for identifying sepsis. Based on these findings, we formulated specific workflow and recommendations for algorithm development, validation, and evaluation within the guidance. Finally, the guidance underwent independent review by a panel of 20 external experts who then convened a consensus meeting to finalize it. RESULTS A standardized workflow for algorithm development, validation, and evaluation was established. Guided by specific health status considerations, the workflow comprises four integrated steps: assessing an existing algorithm's suitability for the target health status; developing a new algorithm using recommended methods; validating the algorithm using prescribed performance measures; and evaluating the impact of the algorithm on study results. Additionally, 13 good practice recommendations were formulated with detailed explanations. Furthermore, a practical study on sepsis identification was included to demonstrate the application of this guidance. CONCLUSIONS The establishment of guidance is intended to aid researchers and clinicians in the appropriate and accurate development and application of algorithms for identifying health status from RCD. This guidance has the potential to enhance the credibility of findings from observational studies involving RCD.
Collapse
Affiliation(s)
- Wen Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China.
| | - Ying-Hui Jin
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Mei Liu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Qiao He
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Jia-Yue Xu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Ming-Qi Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Guo-Wei Li
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, L8S 4L8, Canada
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangzhou, 510317, China
- Biostatistics Unit, Research Institute at St. Joseph's Healthcare Hamilton, Hamilton, ON, L8N 4A6, Canada
| | - Bo Fu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Si-Yu Yan
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Kang Zou
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Xin Sun
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
23
|
Ding S, Zhang S, Hu X, Zou N. Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective. J Biomed Inform 2024; 156:104671. [PMID: 38876452 DOI: 10.1016/j.jbi.2024.104671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 05/26/2024] [Accepted: 06/05/2024] [Indexed: 06/16/2024]
Abstract
Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods' performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.
Collapse
Affiliation(s)
- Sirui Ding
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX, United States
| | - Shenghan Zhang
- Department of Biomedical Informatics, Harvard University, Boston, MA, United States
| | - Xia Hu
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Na Zou
- Department of Industrial Engineering, University of Houston, Houston, TX, United States.
| |
Collapse
|
24
|
Barron AG, Fenick AM, Maciejewski KR, Turer CB, Sharifi M. External Validation of an Electronic Phenotyping Algorithm Detecting Attention to High Body Mass Index in Pediatric Primary Care. Appl Clin Inform 2024; 15:700-708. [PMID: 39197473 PMCID: PMC11387092 DOI: 10.1055/s-0044-1787975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2024] Open
Abstract
OBJECTIVES The lack of feasible and meaningful measures of clinicians' behavior hinders efforts to assess and improve obesity management in pediatric primary care. In this study, we examined the external validity of a novel algorithm, previously validated in a single geographic region, using structured electronic health record (EHR) data to identify phenotypes of clinicians' attention to elevated body mass index (BMI) and weight-related comorbidities. METHODS We extracted structured EHR data for 300 randomly selected 6- to 12-year-old children with elevated BMI seen for well-child visits from June 2018 to May 2019 at pediatric primary care practices affiliated with Yale. Using diagnosis codes, laboratory orders, referrals, and medications adapted from the original algorithm, we categorized encounters as having evidence of attention to BMI only, weight-related comorbidities only, or both BMI and comorbidities. We evaluated the algorithm's sensitivity and specificity for detecting any attention to BMI and/or comorbidities using chart review as the reference standard. RESULTS The adapted algorithm yielded a sensitivity of 79.2% and specificity of 94.0% for identifying any attention to high BMI/comorbidities in clinical documentation. Of 86 encounters labeled as "no attention" by the algorithm, 83% had evidence of attention in free-text components of the progress note. The likelihood of classification as "any attention" by both chart review and the algorithm varied by BMI category and by clinician type (p < 0.001). CONCLUSION The electronic phenotyping algorithm had high specificity for detecting attention to high BMI and/or comorbidities in structured EHR inputs. The algorithm's performance may be improved by incorporating unstructured data from clinical notes.
Collapse
Affiliation(s)
- Anya G Barron
- Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States
| | - Ada M Fenick
- Department of Pediatrics, Yale School of Medicine, New Haven, Connecticut, United States
| | - Kaitlin R Maciejewski
- Yale Center for Analytical Sciences, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States
| | - Christy B Turer
- Departments of Pediatrics and Medicine, University of Texas Southwestern Medical Center and Children's Health, Dallas, Texas, United States
| | - Mona Sharifi
- Department of Pediatrics, Yale School of Medicine, New Haven, Connecticut, United States
| |
Collapse
|
25
|
McCaw ZR, Gao J, Lin X, Gronsbell J. Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks. Nat Genet 2024; 56:1527-1536. [PMID: 38872030 PMCID: PMC11955959 DOI: 10.1038/s41588-024-01793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 05/08/2024] [Indexed: 06/15/2024]
Abstract
Within population biobanks, incomplete measurement of certain traits limits the power for genetic discovery. Machine learning is increasingly used to impute the missing values from the available data. However, performing genome-wide association studies (GWAS) on imputed traits can introduce spurious associations, identifying genetic variants that are not associated with the original trait. Here we introduce a new method, synthetic surrogate (SynSurr) analysis, which makes GWAS on imputed phenotypes robust to imputation errors. Rather than replacing missing values, SynSurr jointly analyzes the original and imputed traits. We show that SynSurr estimates the same genetic effect as standard GWAS and improves power in proportion to the quality of the imputations. SynSurr requires a commonly made missing-at-random assumption but relaxes the requirements of existing imputation methods by not requiring correct model specification. We present extensive simulations and ablation analyses to validate SynSurr and apply it to empower the GWAS of dual-energy X-ray absorptiometry traits within the UK Biobank.
Collapse
Affiliation(s)
- Zachary R McCaw
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
26
|
Thayer DS, Mumtaz S, Elmessary MA, Scanlon I, Zinnurov A, Coldea AI, Scanlon J, Chapman M, Curcin V, John A, DelPozo-Banos M, Davies H, Karwath A, Gkoutos GV, Fitzpatrick NK, Quint JK, Varma S, Milner C, Oliveira C, Parkinson H, Denaxas S, Hemingway H, Jefferson E. Creating a next-generation phenotype library: the health data research UK Phenotype Library. JAMIA Open 2024; 7:ooae049. [PMID: 38895652 PMCID: PMC11182945 DOI: 10.1093/jamiaopen/ooae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 02/12/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Objective To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms. Materials and Methods We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control. The application has creation and editing functionality, enabling researchers to submit phenotypes directly. Results We created and launched the Phenotype Library in October 2021. The platform currently hosts 1049 phenotype definitions defined against 40 health data sources and >200K terms across 16 medical ontologies. We present several case studies demonstrating its utility for supporting and enabling research: the library hosts curated phenotype collections for the BREATHE respiratory health research hub and the Adolescent Mental Health Data Platform, and it is supporting the development of an informatics tool to generate clinical evidence for clinical guideline development groups. Discussion This platform makes an impact by being open to all health data users and accepting all appropriate content, as well as implementing key features that have not been widely available, including managing structured metadata, access via an API, and support for computable phenotypes. Conclusions We have created the first openly available, programmatically accessible resource enabling the global health research community to store and manage phenotyping algorithms. Removing barriers to describing, sharing, and computing phenotypes will help unleash the potential benefit of health data for patients and the public.
Collapse
Affiliation(s)
- Daniel S Thayer
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Shahzad Mumtaz
- Health Informatics Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, United Kingdom
- School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, AB24 3UE, United Kingdom
| | - Muhammad A Elmessary
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Ieuan Scanlon
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Artur Zinnurov
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Alex-Ioan Coldea
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Jack Scanlon
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Martin Chapman
- Department of Population Health Sciences, King’s College London, London, SE1 1UL, United Kingdom
| | - Vasa Curcin
- Department of Population Health Sciences, King’s College London, London, SE1 1UL, United Kingdom
| | - Ann John
- Adolescent Mental Health Data Platform and DATAMIND, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Marcos DelPozo-Banos
- Adolescent Mental Health Data Platform and DATAMIND, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Hannah Davies
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Natalie K Fitzpatrick
- Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
| | - Jennifer K Quint
- School of Public Health and National Heart and Lung Institute, Imperial College London, London, W12 0BZ, United Kingdom
| | - Susheel Varma
- Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| | - Chris Milner
- Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| | - Carla Oliveira
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
- University College London Hospitals National Institute of Health Research Biomedical Research Centre, London, NW1 2BU, United Kingdom
- British Heart Foundation Data Science Center, Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
- University College London Hospitals National Institute of Health Research Biomedical Research Centre, London, NW1 2BU, United Kingdom
| | - Emily Jefferson
- Health Informatics Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, United Kingdom
- Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| |
Collapse
|
27
|
Lam BD, Chrysafi P, Chiasakul T, Khosla H, Karagkouni D, McNichol M, Adamski A, Reyes N, Abe K, Mantha S, Vlachos IS, Zwicker JI, Patell R. Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis. Blood Adv 2024; 8:2991-3000. [PMID: 38522096 PMCID: PMC11215191 DOI: 10.1182/bloodadvances.2023012200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/22/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024] Open
Abstract
ABSTRACT Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.
Collapse
Affiliation(s)
- Barbara D. Lam
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
- Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Pavlina Chrysafi
- Department of Medicine, Mount Auburn Hospital, Harvard Medical School, Boston, MA
| | - Thita Chiasakul
- Center of Excellence in Translational Hematology, Division of Hematology, Department of Medicine, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, Thailand
| | - Harshit Khosla
- Department of Medicine, Saint Vincent Hospital, Worcester, MA
| | - Dimitra Karagkouni
- Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Megan McNichol
- Library Sciences, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Alys Adamski
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Nimia Reyes
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Karon Abe
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Simon Mantha
- Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Ioannis S. Vlachos
- Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Jeffrey I. Zwicker
- Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Rushad Patell
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| |
Collapse
|
28
|
Newby D, Taylor N, Joyce DW, Winchester LM. Optimising the use of electronic medical records for large scale research in psychiatry. Transl Psychiatry 2024; 14:232. [PMID: 38824136 PMCID: PMC11144247 DOI: 10.1038/s41398-024-02911-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 06/03/2024] Open
Abstract
The explosion and abundance of digital data could facilitate large-scale research for psychiatry and mental health. Research using so-called "real world data"-such as electronic medical/health records-can be resource-efficient, facilitate rapid hypothesis generation and testing, complement existing evidence (e.g. from trials and evidence-synthesis) and may enable a route to translate evidence into clinically effective, outcomes-driven care for patient populations that may be under-represented. However, the interpretation and processing of real-world data sources is complex because the clinically important 'signal' is often contained in both structured and unstructured (narrative or "free-text") data. Techniques for extracting meaningful information (signal) from unstructured text exist and have advanced the re-use of routinely collected clinical data, but these techniques require cautious evaluation. In this paper, we survey the opportunities, risks and progress made in the use of electronic medical record (real-world) data for psychiatric research.
Collapse
Affiliation(s)
- Danielle Newby
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
| | - Niall Taylor
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Dan W Joyce
- Department of Primary Care and Mental Health and Civic Health, Innovation Labs, Institute of Population Health, University of Liverpool, Liverpool, UK
| | | |
Collapse
|
29
|
Ma SP, Hosgur E, Corbin CK, Lopez I, Chang A, Chen JH. Electronic Phenotyping of Urinary Tract Infections as a Silver Standard Label for Machine Learning. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:182-189. [PMID: 38827068 PMCID: PMC11141812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This study explored the efficacy of electronic phenotyping in data labeling for machine learning with a focus on urinary tract infections (UTIs). We contrasted labels from electronic phenotyping against previously published labels such as urine culture positivity. In comparison, electronic phenotyping showed the potential to enhance specificity in UTI labeling while maintaining similar sensitivity and was easily scaled for application to a large dataset suitable for machine learning, which we used to train and validate a machine learning model. Electronic phenotyping offers a valuable method for machine learning label generation in healthcare, with potential benefits for patient care and antimicrobial stewardship. Further research will expand its application and optimize techniques for increased performance.
Collapse
Affiliation(s)
- Stephen P Ma
- Stanford University School of Medicine, Stanford, CA, USA
| | - Ebru Hosgur
- Stanford University School of Medicine, Stanford, CA, USA
| | - Conor K Corbin
- Stanford University School of Medicine, Stanford, CA, USA
| | - Ivan Lopez
- Stanford University School of Medicine, Stanford, CA, USA
| | - Amy Chang
- Stanford University School of Medicine, Stanford, CA, USA
| | | |
Collapse
|
30
|
El-Helaly M. Artificial Intelligence and Occupational Health and Safety, Benefits and Drawbacks. LA MEDICINA DEL LAVORO 2024; 115:e2024014. [PMID: 38686574 PMCID: PMC11181216 DOI: 10.23749/mdl.v115i2.15835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/05/2024] [Indexed: 05/02/2024]
Abstract
This paper discusses the impact of artificial intelligence (AI) on occupational health and safety. Although the integration of AI into the field of occupational health and safety is still in its early stages, it has numerous applications in the workplace. Some of these applications offer numerous benefits for the health and safety of workers, such as continuous monitoring of workers' health and safety and the workplace environment through wearable devices and sensors. However, AI might have negative impacts in the workplace, such as ethical worries and data privacy concerns. To maximize the benefits and minimize the drawbacks of AI in the workplace, certain measures should be applied, such as training for both employers and employees and setting policies and guidelines regulating the integration of AI in the workplace.
Collapse
Affiliation(s)
- Mohamed El-Helaly
- Occupational and Environmental Medicine, Faculty of Medicine, Mansoura University, Mansoura City, Egypt
- Faculty of Medicine, New Mansoura University, New Mansoura City, Egypt
| |
Collapse
|
31
|
Esteban S, Szmulewicz A. Making causal inferences from transactional data: A narrative review of opportunities and challenges when implementing the target trial framework. J Int Med Res 2024; 52:3000605241241920. [PMID: 38548473 PMCID: PMC10981242 DOI: 10.1177/03000605241241920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 03/10/2024] [Indexed: 04/01/2024] Open
Abstract
The target trial framework has emerged as a powerful tool for addressing causal questions in clinical practice and in public health. In the healthcare sector, where decision-making is increasingly data-driven, transactional databases, such as electronic health records (EHR) and insurance claims, present an untapped potential for answering complex causal questions. This narrative review explores the potential of the integration of the target trial framework with real-world data to enhance healthcare decision-making processes. We outline essential elements of the target trial framework, and identify pertinent challenges in data quality, privacy concerns, and methodological limitations, proposing solutions to overcome these obstacles and optimize the framework's application.
Collapse
Affiliation(s)
- Santiago Esteban
- Instituto de Efectividad Clínica y Sanitaria, Centro de Implementación e Innovación en Políticas de Salud, Buenos Aires, Argentina
- Hospital Italiano de Buenos Aires, Family and Community Medicine Division Buenos Aires, Buenos Aires, Argentina
| | | |
Collapse
|
32
|
Davidoff C, Cheville A. Telemedicine in Cancer Rehabilitation: Applications and Opportunities Across the Cancer Care Continuum. Am J Phys Med Rehabil 2024; 103:S52-S57. [PMID: 38364031 DOI: 10.1097/phm.0000000000002421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2024]
Abstract
ABSTRACT Advancements in telemedicine have revolutionized the landscape of healthcare delivery, with particular implications for cancer rehabilitation. This journal article provides a comprehensive review of the utilization and application of telemedicine in cancer rehabilitation, spanning the entire cancer care continuum. The integration of telemedicine in cancer rehabilitation services is explored from diagnosis through survivorship, addressing the unique challenges and opportunities at each stage.
Collapse
Affiliation(s)
- Chanel Davidoff
- From the Department of Physical Medicine and Rehabilitation, Lenox Hill Hospital/Northwell Health, Zucker School of Medicine at Hofstra/Northwell, New York, New York (CD); and Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, Minnesota (AC)
| | | |
Collapse
|
33
|
Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large Language Models Facilitate the Generation of Electronic Health Record Phenotyping Algorithms. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.12.19.23300230. [PMID: 38196578 PMCID: PMC10775330 DOI: 10.1101/2023.12.19.23300230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Objectives Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. Materials and Methods We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (i.e., type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. Results GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). Conclusion GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
Collapse
Affiliation(s)
- Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Henry H. Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Monika E. Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Matthew S. Krantz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Wu-Chen Su
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Alyson L. Dickson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Josh F. Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Dan M. Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - C. Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - V. Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Bradley A. Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Computer Science, Vanderbilt University, Nashville, TN
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Computer Science, Vanderbilt University, Nashville, TN
| |
Collapse
|
34
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
35
|
Oh W, Jayaraman P, Tandon P, Chaddha US, Kovatch P, Charney AW, Glicksberg BS, Nadkarni GN. A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19. Artif Intell Med 2024; 148:102750. [PMID: 38325922 PMCID: PMC10864255 DOI: 10.1016/j.artmed.2023.102750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024]
Abstract
Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.
Collapse
Affiliation(s)
- Wonsuk Oh
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Pushkala Jayaraman
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pranai Tandon
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Udit S Chaddha
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Patricia Kovatch
- Department of Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexander W Charney
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Character Biosciences, New York, NY, USA
| | - Girish N Nadkarni
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
36
|
Bali V, Turzhitsky V, Schelfhout J, Paudel M, Hulbert E, Peterson-Brandt J, Hertzberg J, Kelly NR, Patel RH. Machine learning to identify chronic cough from administrative claims data. Sci Rep 2024; 14:2449. [PMID: 38291064 PMCID: PMC10828499 DOI: 10.1038/s41598-024-51522-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 01/06/2024] [Indexed: 02/01/2024] Open
Abstract
Accurate identification of patient populations is an essential component of clinical research, especially for medical conditions such as chronic cough that are inconsistently defined and diagnosed. We aimed to develop and compare machine learning models to identify chronic cough from medical and pharmacy claims data. In this retrospective observational study, we compared 3 machine learning algorithms based on XG Boost, logistic regression, and neural network approaches using a large claims and electronic health record database. Of the 327,423 patients who met the study criteria, 4,818 had chronic cough based on linked claims-electronic health record data. The XG Boost model showed the best performance, achieving a Receiver-Operator Characteristic Area Under the Curve (ROC-AUC) of 0.916. We selected a cutoff that favors a high positive predictive value (PPV) to minimize false positives, resulting in a sensitivity, specificity, PPV, and negative predictive value of 18.0%, 99.6%, 38.7%, and 98.8%, respectively on the held-out testing set (n = 82,262). Logistic regression and neural network models achieved slightly lower ROC-AUCs of 0.907 and 0.838, respectively. The XG Boost and logistic regression models maintained their robust performance in subgroups of individuals with higher rates of chronic cough. Machine learning algorithms are one way of identifying conditions that are not coded in medical records, and can help identify individuals with chronic cough from claims data with a high degree of classification value.
Collapse
Affiliation(s)
- Vishal Bali
- Center for Observational and Real-World Evidence (CORE), Merck & Co, Rahway, NJ, USA.
| | - Vladimir Turzhitsky
- Center for Observational and Real-World Evidence (CORE), Merck & Co, Rahway, NJ, USA
| | - Jonathan Schelfhout
- Center for Observational and Real-World Evidence (CORE), Merck & Co, Rahway, NJ, USA
| | - Misti Paudel
- Health Economics and Outcomes Research (HEOR), Optum Insight, Eden Prairie, MN, USA
| | - Erin Hulbert
- Health Economics and Outcomes Research (HEOR), Optum Insight, Eden Prairie, MN, USA
| | | | | | | | | |
Collapse
|
37
|
Lyu T, Liang C. Computational Phenotyping of OMOP CDM Normalized EHR for Prenatal and Postpartum Episodes: An Informatics Framework and Clinical Implementation on All of Us. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:1096-1104. [PMID: 38222375 PMCID: PMC10785883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The use of Electronic Health Records (EHR) in pregnancy care and obstetrics-gynecology (OB/GYN) research has increased in recent years. In pregnancy, timing is important because clinical characteristics, risks, and patient management are different in each stage of pregnancy. However, the difficulty of accurately differentiating pregnancy episodes and temporal information of clinical events presents unique challenges for EHR phenotyping. In this work, we introduced the concept of time relativity and proposed a comprehensive framework of computational phenotyping for prenatal and postpartum episodes based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). We implemented it on the All of Us national EHR database and identified 6,280 pregnancies with accurate start and end dates among 5,399 female patients. With the ability to identify different episodes in pregnancy care, this framework provides new opportunities for phenotyping complex clinical events and gestational morbidities for pregnant women, thus improving maternal and infant health.
Collapse
Affiliation(s)
- Tianchu Lyu
- University of South Carolina, Columbia, South Carolina, USA
| | - Chen Liang
- University of South Carolina, Columbia, South Carolina, USA
- National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
38
|
Davis H, Tang LA, M Picou E, Bastarache L, Tharpe AM. The Use of Electronic Health Records for Behavioral Phenotyping of School-Age Children With Unilateral Hearing Loss: A Methodological Approach. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:254-268. [PMID: 38056484 DOI: 10.1044/2023_jslhr-22-00610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
PURPOSE This methodological study describes a technique for extracting information from de-identified electronic health records (EHRs) to identify occurrences of permanent unilateral hearing loss (UHL) and associated educational comorbidities. METHOD This was an exploratory methodological study utilizing approximately 3.3 million de-identified medical records. Structured and unstructured data were extracted using both automated and manual methods. When both methods were available, positive and negative predictive values were calculated to evaluate the utility of using automated methods. RESULTS We defined a cohort of 471 records that met our criteria of school-age children with permanent UHL and no additional significant disabilities/diagnoses. Fifty-one percent of the children reflected in this cohort had indicators of adverse educational progress, defined as documentation of receiving educational services, speech-language therapy, and/or parental/teacher concern, with 12% of records reflecting overlapping services/concerns. Negative predictive values were generally high and positive predictive values were generally low, suggesting automated searches are useful for excluding factors of interest, but not finding them. CONCLUSIONS This study demonstrates the feasibility of using EHRs in examining UHL in school-age children. By restricting our cohort to individuals who were seen in audiology clinic, we were able to capture variables such as educational difficulty that are not routinely ascertained in medical contexts. The proportion of children in this cohort demonstrating a marker of adverse educational progress is consistent with numerous prior observational studies, thus providing validity to this ascertainment approach. We describe challenges encountered in creating this cohort and detail our hybrid approach to ascertaining key variables accurately.
Collapse
Affiliation(s)
- Hilary Davis
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Leigh Anne Tang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Erin M Picou
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Anne Marie Tharpe
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
39
|
Chen F, Ahimaz P, Wang K, Chung WK, Ta C, Weng C, Liu C. Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders. RESEARCH SQUARE 2023:rs.3.rs-3593490. [PMID: 38045411 PMCID: PMC10690317 DOI: 10.21203/rs.3.rs-3593490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Rare disease patients often endure prolonged diagnostic odysseys and may still remain undiagnosed for years. Selecting the appropriate genetic tests is crucial to lead to timely diagnosis. Phenotypic features offer great potential for aiding genomic diagnosis in rare disease cases. We see great promise in effective integration of phenotypic information into genetic test selection workflow. In this study, we present a phenotype-driven molecular genetic test recommendation (Phen2Test) for pediatric rare disease diagnosis. Phen2Test was constructed using frequency matrix of phecodes and demographic data from the EHR before ordering genetic tests, with the objective to streamline the selection of molecular genetic tests (whole-exome / whole-genome sequencing, or gene panels) for clinicians with minimum genetic training expertise. We developed and evaluated binary classifiers based on 1,005 individuals referred to genetic counselors for potential genetic evaluation. In the evaluation using the gold standard cohort, the model achieved strong performance with an AUROC of 0.82 and an AUPRC of 0.92. Furthermore, we tested the model on another silver standard cohort (n=6,458), achieving an overall AUROC of 0.72 and an AUPRC of 0.671. Phen2Test was adjusted to align with current clinical guidelines, showing superior performance with more recent data, demonstrating its potential for use within a learning healthcare system as a genomic medicine intervention that adapts to guideline updates. This study showcases the practical utility of phenotypic features in recommending molecular genetic tests with performance comparable to clinical geneticists. Phen2Test could assist clinicians with limited genetic training and knowledge to order appropriate genetic tests.
Collapse
Affiliation(s)
- Fangyi Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Priyanka Ahimaz
- Department of Pediatrics, Columbia University, New York, NY, USA
- Institute of Genomic Medicine, Columbia University, New York, NY, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Wendy K. Chung
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Casey Ta
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
40
|
Byon HD, Harris C, Crandall M, Song J, Topaz M. Identifying Type II workplace violence from clinical notes using natural language processing. Workplace Health Saf 2023; 71:484-490. [PMID: 37387505 DOI: 10.1177/21650799231176078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
BACKGROUND Type II workplace violence in health care, perpetrated by patients/clients toward home healthcare nurses, is a serious health and safety issue. A significant portion of violent incidents are not officially reported. Natural language processing can detect these "hidden cases" from clinical notes. In this study, we computed the 12-month prevalence of Type II workplace violence from home healthcare nurses' clinical notes by developing and utilizing a natural language processing system. METHODS Nearly 600,000 clinical visit notes from two large U.S.-based home healthcare agencies were analyzed. The notes were recorded from January 1, 2019 to December 31, 2019. Rule- and machine-learning-based natural language processing algorithms were applied to identify clinical notes containing workplace violence descriptions. RESULTS The natural language processing algorithms identified 236 clinical notes that included Type II workplace violence toward home healthcare nurses. The prevalence of physical violence was 0.067 incidents per 10,000 home visits. The prevalence of nonphysical violence was 3.76 incidents per 10,000 home visits. The prevalence of any violence was four incidents per 10,000 home visits. In comparison, no Type II workplace violence incidents were recorded in the official incident report systems of the two agencies in this same time period. CONCLUSIONS AND APPLICATION TO PRACTICE Natural language processing can be an effective tool to augment formal reporting by capturing violence incidents from daily, ongoing, large volumes of clinical notes. It can enable managers and clinicians to stay informed of potential violence risks and keep their practice environment safe.
Collapse
Affiliation(s)
- Ha Do Byon
- University of Virginia School of Nursing
| | | | - Mary Crandall
- University of Virginia School of Nursing
- Continuum Home Health Care, UVA Health
| | | | - Maxim Topaz
- Columbia University School of Nursing
- Columbia University Data Science Institute
- VNS Health
| |
Collapse
|
41
|
Callahan A, Ashley E, Datta S, Desai P, Ferris TA, Fries JA, Halaas M, Langlotz CP, Mackey S, Posada JD, Pfeffer MA, Shah NH. The Stanford Medicine data science ecosystem for clinical and translational research. JAMIA Open 2023; 6:ooad054. [PMID: 37545984 PMCID: PMC10397535 DOI: 10.1093/jamiaopen/ooad054] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 03/14/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Objective To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Euan Ashley
- Department of Medicine, School of Medicine, Stanford University, Stanford, California, USA
- Department of Genetics, School of Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Somalee Datta
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Priyamvada Desai
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Todd A Ferris
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Jason A Fries
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Michael Halaas
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Curtis P Langlotz
- Department of Radiology, School of Medicine, Stanford University, Stanford, California, USA
| | - Sean Mackey
- Department of Anesthesia, School of Medicine, Stanford University, Stanford, California, USA
| | - José D Posada
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Michael A Pfeffer
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
- Clinical Excellence Research Center, School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
42
|
Flothow A, Novelli A, Sundmacher L. Analytical methods for identifying sequences of utilization in health data: a scoping review. BMC Med Res Methodol 2023; 23:212. [PMID: 37759162 PMCID: PMC10523647 DOI: 10.1186/s12874-023-02019-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/08/2023] [Indexed: 09/29/2023] Open
Abstract
BACKGROUND Healthcare, as with other sectors, has undergone progressive digitalization, generating an ever-increasing wealth of data that enables research and the analysis of patient movement. This can help to evaluate treatment processes and outcomes, and in turn improve the quality of care. This scoping review provides an overview of the algorithms and methods that have been used to identify care pathways from healthcare utilization data. METHOD This review was conducted according to the methodology of the Joanna Briggs Institute and the Preferred Reporting Items for Systematic Reviews Extension for Scoping Reviews (PRISMA-ScR) Checklist. The PubMed, Web of Science, Scopus, and EconLit databases were searched and studies published in English between 2000 and 2021 considered. The search strategy used keywords divided into three categories: the method of data analysis, the requirement profile for the data, and the intended presentation of results. Criteria for inclusion were that health data were analyzed, the methodology used was described and that the chronology of care events was considered. In a two-stage review process, records were reviewed by two researchers independently for inclusion. Results were synthesized narratively. RESULTS The literature search yielded 2,865 entries; 51 studies met the inclusion criteria. Health data from different countries ([Formula: see text]) and of different types of disease ([Formula: see text]) were analyzed with respect to different care events. Applied methods can be divided into those identifying subsequences of care and those describing full care trajectories. Variants of pattern mining or Markov models were mostly used to extract subsequences, with clustering often applied to find care trajectories. Statistical algorithms such as rule mining, probability-based machine learning algorithms or a combination of methods were also applied. Clustering methods were sometimes used for data preparation or result compression. Further characteristics of the included studies are presented. CONCLUSION Various data mining methods are already being applied to gain insight from health data. The great heterogeneity of the methods used shows the need for a scoping review. We performed a narrative review and found that clustering methods currently dominate the literature for identifying complete care trajectories, while variants of pattern mining dominate for identifying subsequences of limited length.
Collapse
Affiliation(s)
- Amelie Flothow
- Chair of Health Economics, Technical University of Munich, Georg-Brauchle-Ring, Munich, Bavaria, 80992, Germany.
| | - Anna Novelli
- Chair of Health Economics, Technical University of Munich, Georg-Brauchle-Ring, Munich, Bavaria, 80992, Germany
| | - Leonie Sundmacher
- Chair of Health Economics, Technical University of Munich, Georg-Brauchle-Ring, Munich, Bavaria, 80992, Germany
| |
Collapse
|
43
|
Singhal P, Tan ALM, Drivas TG, Johnson KB, Ritchie MD, Beaulieu-Jones BK. Opportunities and challenges for biomarker discovery using electronic health record data. Trends Mol Med 2023; 29:765-776. [PMID: 37474378 PMCID: PMC10530198 DOI: 10.1016/j.molmed.2023.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/16/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023]
Abstract
Electronic health records (EHRs) have become increasingly relied upon as a source for biomedical research. One important research application of EHRs is the identification of biomarkers associated with specific patient states, especially within complex conditions. However, using EHRs for biomarker identification can be challenging because the EHR was not designed with research as the primary focus. Despite this challenge, the EHR offers huge potential for biomarker discovery research to transform our understanding of disease etiology and treatment and generate biological insights informing precision medicine initiatives. This review paper provides an in-depth analysis of how EHR data is currently used for phenotyping and identifying molecular biomarkers, current challenges and limitations, and strategies we can take to mitigate challenges going forward.
Collapse
Affiliation(s)
- P Singhal
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - A L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - T G Drivas
- Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - K B Johnson
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA; Department of Pediatrics, University of Pennsylvania, Philadelphia, PA, USA
| | - M D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| | | |
Collapse
|
44
|
Hovaguimian F, Beeler PE, Müllhaupt B, Günthard HF, Maeschli B, Bruggmann P, Fehr JS, Kouyos RD. Mortality and morbidity related to hepatitis C virus infection in hospitalized adults-A propensity score matched analysis. J Viral Hepat 2023; 30:765-774. [PMID: 37309273 DOI: 10.1111/jvh.13861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 05/01/2023] [Accepted: 05/30/2023] [Indexed: 06/14/2023]
Abstract
The World Health Organization (WHO) aims to reduce HCV mortality, but estimates are difficult to obtain. We aimed to identify electronic health records of individuals with HCV infection, and assess mortality and morbidity. We applied electronic phenotyping strategies on routinely collected data from patients hospitalized at a tertiary referral hospital in Switzerland between 2009 and 2017. Individuals with HCV infection were identified using International Classification of Disease (ICD)-10 codes, prescribed medications and laboratory results (antibody, PCR, antigen or genotype test). Controls were selected using propensity score methods (matching by age, sex, intravenous drug use, alcohol abuse and HIV co-infection). Main outcomes were in-hospital mortality and attributable mortality (in HCV cases and study population). The non-matched dataset included records from 165,972 individuals (287,255 hospital stays). Electronic phenotyping identified 2285 stays with evidence of HCV infection (1677 individuals). Propensity score matching yielded 6855 stays (2285 with HCV, 4570 controls). In-hospital mortality was higher in HCV cases (RR 2.10, 95%CI 1.64 to 2.70). Among those infected, 52.5% of the deaths were attributable to HCV (95%CI 38.9 to 63.1). When cases were matched, the fraction of deaths attributable to HCV was 26.9% (HCV prevalence: 33%), whilst in the non-matched dataset, it was 0.92% (HCV prevalence: 0.8%). In this study, HCV infection was strongly associated with increased mortality. Our methodology may be used to monitor the efforts towards meeting the WHO elimination targets and underline the importance of electronic cohorts as a basis for national longitudinal surveillance.
Collapse
Affiliation(s)
- Frédérique Hovaguimian
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
- Department of Public and Global Health, Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Patrick E Beeler
- Division of Occupational and Environmental Medicine, Epidemiology, Biostatistics and Prevention Institute, University of Zurich and University Hospital Zurich, Zurich, Switzerland
- Center for Primary and Community Care, University of Lucerne, Lucerne, Switzerland
| | - Beat Müllhaupt
- Department of Gastroenterology and Hepatology, University Hospital of Zurich, Zurich, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | | | - Philip Bruggmann
- Swiss Hepatitis, Zurich, Switzerland
- Arud Centre for Addiction Medicine, Zurich, Switzerland
| | - Jan S Fehr
- Department of Public and Global Health, Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Roger D Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
45
|
Krishnan G, Singh S, Pathania M, Gosavi S, Abhishek S, Parchani A, Dhar M. Artificial intelligence in clinical medicine: catalyzing a sustainable global healthcare paradigm. Front Artif Intell 2023; 6:1227091. [PMID: 37705603 PMCID: PMC10497111 DOI: 10.3389/frai.2023.1227091] [Citation(s) in RCA: 84] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/09/2023] [Indexed: 09/15/2023] Open
Abstract
As the demand for quality healthcare increases, healthcare systems worldwide are grappling with time constraints and excessive workloads, which can compromise the quality of patient care. Artificial intelligence (AI) has emerged as a powerful tool in clinical medicine, revolutionizing various aspects of patient care and medical research. The integration of AI in clinical medicine has not only improved diagnostic accuracy and treatment outcomes, but also contributed to more efficient healthcare delivery, reduced costs, and facilitated better patient experiences. This review article provides an extensive overview of AI applications in history taking, clinical examination, imaging, therapeutics, prognosis and research. Furthermore, it highlights the critical role AI has played in transforming healthcare in developing nations.
Collapse
Affiliation(s)
- Gokul Krishnan
- Department of Internal Medicine, Kasturba Medical College, Manipal, India
| | - Shiana Singh
- Department of Emergency Medicine, All India Institute of Medical Sciences, Rishikesh, India
| | - Monika Pathania
- Department of Geriatric Medicine, All India Institute of Medical Sciences, Rishikesh, India
| | - Siddharth Gosavi
- Department of Internal Medicine, Kasturba Medical College, Manipal, India
| | - Shuchi Abhishek
- Department of Internal Medicine, Kasturba Medical College, Manipal, India
| | - Ashwin Parchani
- Department of Geriatric Medicine, All India Institute of Medical Sciences, Rishikesh, India
| | - Minakshi Dhar
- Department of Geriatric Medicine, All India Institute of Medical Sciences, Rishikesh, India
| |
Collapse
|
46
|
Liu P, Wang Z, Liu N, Peres MA. A scoping review of the clinical application of machine learning in data-driven population segmentation analysis. J Am Med Inform Assoc 2023; 30:1573-1582. [PMID: 37369006 PMCID: PMC10436153 DOI: 10.1093/jamia/ocad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/08/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
OBJECTIVE Data-driven population segmentation is commonly used in clinical settings to separate the heterogeneous population into multiple relatively homogenous groups with similar healthcare features. In recent years, machine learning (ML) based segmentation algorithms have garnered interest for their potential to speed up and improve algorithm development across many phenotypes and healthcare situations. This study evaluates ML-based segmentation with respect to (1) the populations applied, (2) the segmentation details, and (3) the outcome evaluations. MATERIALS AND METHODS MEDLINE, Embase, Web of Science, and Scopus were used following the PRISMA-ScR criteria. Peer-reviewed studies in the English language that used data-driven population segmentation analysis on structured data from January 2000 to October 2022 were included. RESULTS We identified 6077 articles and included 79 for the final analysis. Data-driven population segmentation analysis was employed in various clinical settings. K-means clustering is the most prevalent unsupervised ML paradigm. The most common settings were healthcare institutions. The most common targeted population was the general population. DISCUSSION Although all the studies did internal validation, only 11 papers (13.9%) did external validation, and 23 papers (29.1%) conducted methods comparison. The existing papers discussed little validating the robustness of ML modeling. CONCLUSION Existing ML applications on population segmentation need more evaluations regarding giving tailored, efficient integrated healthcare solutions compared to traditional segmentation analysis. Future ML applications in the field should emphasize methods' comparisons and external validation and investigate approaches to evaluate individual consistency using different methods.
Collapse
Affiliation(s)
- Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Ziwen Wang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Marco Aurélio Peres
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore, Singapore
| |
Collapse
|
47
|
Banda JM, Shah NH, Periyakoil VS. Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases. JAMIA Open 2023; 6:ooad043. [PMID: 37397506 PMCID: PMC10307941 DOI: 10.1093/jamiaopen/ooad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/06/2023] [Accepted: 06/22/2023] [Indexed: 07/04/2023] Open
Abstract
Objective Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults. Materials and methods We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework. Results We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others. Discussion Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences. Conclusion We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.
Collapse
Affiliation(s)
- Juan M Banda
- Corresponding Author: Juan M. Banda, PhD, Department of Computer Science, College of Arts and Sciences, Georgia State University, 25 Park Place, Suite 752, Atlanta, GA 30303, USA;
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, USA
| | - Vyjeyanthi S Periyakoil
- Stanford Department of Medicine, Palo Alto, California, USA
- VA Palo Alto Health Care System, Palo Alto, California, USA
| |
Collapse
|
48
|
Eskofier BM, Klucken J. Predictive Models for Health Deterioration: Understanding Disease Pathways for Personalized Medicine. Annu Rev Biomed Eng 2023; 25:131-156. [PMID: 36854259 DOI: 10.1146/annurev-bioeng-110220-030247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) methods are currently widely employed in medicine and healthcare. A PubMed search returns more than 100,000 articles on these topics published between 2018 and 2022 alone. Notwithstanding several recent reviews in various subfields of AI and ML in medicine, we have yet to see a comprehensive review around the methods' use in longitudinal analysis and prediction of an individual patient's health status within a personalized disease pathway. This review seeks to fill that gap. After an overview of the AI and ML methods employed in this field and of specific medical applications of models of this type, the review discusses the strengths and limitations of current studies and looks ahead to future strands of research in this field. We aim to enable interested readers to gain a detailed impression of the research currently available and accordingly plan future work around predictive models for deterioration in health status.
Collapse
Affiliation(s)
- Bjoern M Eskofier
- Machine Learning and Data Analytics Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany;
| | - Jochen Klucken
- Digital Medicine Group, Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, Belvaux, Luxembourg
- Digital Medicine Group, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg
- Centre Hospitalier de Luxembourg, Luxembourg City, Luxembourg
| |
Collapse
|
49
|
La Cava WG, Lee PC, Ajmal I, Ding X, Solanki P, Cohen JB, Moore JH, Herman DS. A flexible symbolic regression method for constructing interpretable clinical prediction models. NPJ Digit Med 2023; 6:107. [PMID: 37277550 PMCID: PMC10241925 DOI: 10.1038/s41746-023-00833-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 05/05/2023] [Indexed: 06/07/2023] Open
Abstract
Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10-6) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT's models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10-6). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.
Collapse
Affiliation(s)
- William G La Cava
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Paul C Lee
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Imran Ajmal
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Xiruo Ding
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Priyanka Solanki
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jordana B Cohen
- Division of Renal-Electrolyte and Hypertension, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel S Herman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
50
|
Hou J, Zhao R, Gronsbell J, Lin Y, Bonzel CL, Zeng Q, Zhang S, Beaulieu-Jones BK, Weber GM, Jemielita T, Wan SS, Hong C, Cai T, Wen J, Ayakulangara Panickan V, Liaw KL, Liao K, Cai T. Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies. J Med Internet Res 2023; 25:e45662. [PMID: 37227772 PMCID: PMC10251230 DOI: 10.2196/45662] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 03/31/2023] [Accepted: 04/05/2023] [Indexed: 05/26/2023] Open
Abstract
Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.
Collapse
Affiliation(s)
- Jue Hou
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - Rachel Zhao
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Yucong Lin
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing, China
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Qingyi Zeng
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - Sinian Zhang
- School of Statistics, Renmin University of China, Bejing, China
| | | | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | | | | | - Chuan Hong
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States
| | - Tianrun Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Jun Wen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | | | | | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, United States
| |
Collapse
|