1
|
Yin Y, Shao Y, Ma P, Zeng-Treitler Q, Nelson SJ. Machine-Learned Codes from EHR Data Predict Hard Outcomes Better than Human-Assigned ICD Codes. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2025; 7:36. [PMID: 40406594 PMCID: PMC12093355 DOI: 10.3390/make7020036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/25/2025]
Abstract
We used machine learning (ML) to characterize 894,154 medical records of outpatient visits from the Veterans Administration Central Data Warehouse (VA CDW) by the likelihood of assignment of 200 International Classification of Diseases (ICD) code blocks. Using four different predictive models, we found the ML-derived predictions for the code blocks were consistently more effective in predicting death or 90-day rehospitalization than the assigned code block in the record. We reviewed records of ICD chapter assignments. The review revealed that the ML-predicted chapter assignments were consistently better than those humanly assigned. Impact factor analysis, a method of explanation of AI findings that was developed in our group, demonstrated little effect on any one assigned ICD code block but a marked impact on the ML-derived code blocks of kidney disease as well as several other morbidities. In this study, machine learning was much better than human code assignment at predicting the relatively rare outcomes of death or rehospitalization. Future work will address generalizability using other datasets, as well as addressing coding that is more nuanced than that of the categorization provided by code blocks.
Collapse
Affiliation(s)
- Ying Yin
- Biomedical Informatics Center, George Washington University, Washington, DC 20052, USA
- Veterans Administration Hospital, Washington, DC 20422, USA
| | - Yijun Shao
- Biomedical Informatics Center, George Washington University, Washington, DC 20052, USA
- Veterans Administration Hospital, Washington, DC 20422, USA
| | - Phillip Ma
- Biomedical Informatics Center, George Washington University, Washington, DC 20052, USA
- Veterans Administration Hospital, Washington, DC 20422, USA
| | - Qing Zeng-Treitler
- Biomedical Informatics Center, George Washington University, Washington, DC 20052, USA
- Veterans Administration Hospital, Washington, DC 20422, USA
| | - Stuart J. Nelson
- Biomedical Informatics Center, George Washington University, Washington, DC 20052, USA
| |
Collapse
|
2
|
Williams N. Automating assignment of HIV+ patients into phenogroups from demography bound phenotype attack rates. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025; 2024:1235-1244. [PMID: 40417533 PMCID: PMC12099429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
Evidence based medicine and health data for policy should update statistical data modeling methods to take advantage of at-scale data. One challenge with at-scale data is information segmentation for clinical presentation discovery and cohort assignment. We use gradient boosting machine (GBM) to segment 94,379,175,015 diagnostic clinical events attributable to 283,632,789 Centers for Medicare and Medicaid Services beneficiaries over 22 observation years. Diagnostic events were aggregated into attack rates by demography and Phenome-wide association studies (PheWas) codes. Resulting attack rates were then segmented within a user defined clinical status of interest, in this case HIV status. 1,753,647 HIV+ beneficiaries were considered. The GBM model assigned 19,651,408 PheWas attack rates from 69,133,296 ICD attack rates into phenogroups/nodes.
Collapse
Affiliation(s)
- Nick Williams
- The Lister Hill National Center for Biomedical Communications, National Library of Medicine, USA
| |
Collapse
|
3
|
Liu T, Krentz AJ, Huo Z, Ćurčin V. Opportunities and Challenges of Cardiovascular Disease Risk Prediction for Primary Prevention Using Machine Learning and Electronic Health Records: A Systematic Review. Rev Cardiovasc Med 2025; 26:37443. [PMID: 40351688 PMCID: PMC12059770 DOI: 10.31083/rcm37443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/13/2025] [Accepted: 03/20/2025] [Indexed: 05/14/2025] Open
Abstract
Background Cardiovascular disease (CVD) remains the foremost cause of morbidity and mortality worldwide. Recent advancements in machine learning (ML) have demonstrated substantial potential in augmenting risk stratification for primary prevention, surpassing conventional statistical models in predictive performance. Thus, integrating ML with Electronic Health Records (EHRs) enables refined risk estimation by leveraging the granularity and breadth of longitudinal individual patient data. However, fundamental barriers persist, including limited generalizability, challenges in interpretability, and the absence of rigorous external validation, all of which impede widespread clinical deployment. Methods This review adheres to the methodological rigor of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. A systematic literature search was performed in March 2024, encompassing the Medline and Embase databases, to identify studies published since 2010. Supplementary references were retrieved from the Institute for Scientific Information (ISI) Web of Science, and manual searches were curated. The selection process, conducted via Rayyan, focused on systematic and narrative reviews evaluating ML-driven models for long-term CVD risk prediction within primary prevention contexts utilizing EHR data. Studies investigating short-term prognostication, highly specific comorbid cohorts, or conventional models devoid of ML components were excluded. Results Following an exhaustive screening of 1757 records, 22 studies met the inclusion criteria. Of these, 10 were systematic reviews (four incorporating meta-analyses), while 12 constituted narrative reviews, with the majority published post-2020. The synthesis underscores the superiority of ML in modeling intricate EHR-derived risk factors, facilitating precision-driven cardiovascular risk assessment. Nonetheless, salient challenges endure heterogeneity in CVD outcome definitions, undermine comparability, data incompleteness and inconsistency compromise model robustness, and a dearth of external validation constrains clinical translatability. Moreover, ethical and regulatory considerations, including algorithmic opacity, equity in predictive performance, and the absence of standardized evaluation frameworks, pose formidable obstacles to seamless integration into clinical workflows. Conclusions Despite the transformative potential of ML-based CVD risk prediction, it remains encumbered by methodological, technical, and regulatory impediments that hinder its full-scale adoption into real-world healthcare settings. This review underscores the imperative circumstances for standardized validation protocols, stringent regulatory oversight, and interdisciplinary collaboration to bridge the translational divide. Our findings established an integrative framework for developing, validating, and applying ML-based CVD risk prediction algorithms, addressing both clinical and technical dimensions. To further advance this field, we propose a standardized, transparent, and regulated EHR platform that facilitates fair model evaluation, reproducibility, and clinical translation by providing a high-quality, representative dataset with structured governance and benchmarking mechanisms. Meanwhile, future endeavors must prioritize enhancing model transparency, mitigating biases, and ensuring adaptability to heterogeneous clinical populations, fostering equitable and evidence-based implementation of ML-driven predictive analytics in cardiovascular medicine.
Collapse
Affiliation(s)
- Tianyi Liu
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
| | - Andrew J. Krentz
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
- Metadvice, 1025 St-Sulpice, Switzerland
| | - Zhiqiang Huo
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
| | - Vasa Ćurčin
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
| |
Collapse
|
4
|
Al-Sultani Z, Inglis TJ, McFadden B, Thomas E, Reynolds M. Sepsis in silico: definition, development and application of an electronic phenotype for sepsis. J Med Microbiol 2025; 74. [PMID: 40153307 DOI: 10.1099/jmm.0.001986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2025] Open
Abstract
Repurposing electronic health record (EHR) or electronic medical record (EMR) data holds significant promise for evidence-based epidemic intelligence and research. Key challenges include sepsis recognition by physicians and issues with EHR and EMR data. Recent advances in data-driven techniques, alongside initiatives like the Surviving Sepsis Campaign and the Severe Sepsis and Septic Shock Management Bundle (SEP-1), have improved sepsis definition, early detection, subtype characterization, prognostication and personalized treatment. This includes identifying potential biomarkers or digital signatures to enhance diagnosis, guide therapy and optimize clinical management. Machine learning applications play a crucial role in identifying biomarkers and digital signatures associated with sepsis and its sub-phenotypes. Additionally, electronic phenotyping, leveraging EHR and EMR data, has emerged as a valuable tool for evidence-based sepsis identification and management. This review examines methods for identifying sepsis cohorts, focusing on two main approaches: utilizing health administrative data with standardized diagnostic coding via the International Classification of Diseases and integrating clinical data. This overview provides a comprehensive analysis of current cohort identification and electronic phenotyping strategies for sepsis, highlighting their potential applications and challenges. The accuracy of an electronic phenotype or signature is pivotal for precision medicine, enabling a shift from subjective clinical descriptions to data-driven insights.
Collapse
Affiliation(s)
- Zahraa Al-Sultani
- School of Physics, Maths and Computing, Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, Australia
| | - Timothy Jj Inglis
- Division of Pathology and Laboratory Medicine, School of Medicine, University of Western Australia, Crawley, WA 6009, Australia
- PathWest Laboratory Medicine WA, QEII Medical Centre, Nedlands, WA 6009, Australia
| | - Benjamin McFadden
- School of Physics, Maths and Computing, Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, Australia
| | - Elizabeth Thomas
- Curtin School of Population Health, Curtin University, Bentley, WA 6845, Australia
| | - Mark Reynolds
- School of Physics, Maths and Computing, Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, Australia
| |
Collapse
|
5
|
Pruinelli L, Balakrishnan K, Ma S, Li Z, Wall A, Lai JC, Schold JD, Pruett T, Simon G. Transforming liver transplant allocation with artificial intelligence and machine learning: a systematic review. BMC Med Inform Decis Mak 2025; 25:98. [PMID: 39994720 PMCID: PMC11852809 DOI: 10.1186/s12911-025-02890-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 01/22/2025] [Indexed: 02/26/2025] Open
Abstract
BACKGROUND The principles of urgency, utility, and benefit are fundamental concepts guiding the ethical and practical decision-making process for organ allocation; however, LT allocation still follows an urgency model. AIM To identify and analyze data elements used in Machine Learning (ML) and Artificial Intelligence (AI) methods, data sources, and their focus on urgency, utility, or benefit in LT. METHODS A comprehensive search across Ovid Medline and Scopus was conducted for studies published from 2002 to June 2023. Inclusion criteria targeted quantitative studies using ML/AI for candidates, donors, or recipients. Two reviewers assessed eligibility and extracted data, following PRISMA guidelines. RESULTS A total of 20 papers were included, synthesizing results into five major categories. Eight studies were led by a Spanish team, focusing on donor-recipient matching and proposing machine learning models to predict post- LT survival. Other international studies addressed organ supply-demand issues and developed predictive models to optimize LT outcomes. The studies highlight the potential of ML/AI to enhance LT allocation and outcomes. Despite advancements, limitations included the lack of robust transplant-related benefit models and improvements in urgency models compared to MELD. DISCUSSION This review highlighted the potential of AI and ML to enhance liver transplant allocation and outcomes. Significant advancements were noted, but limitations such as the need for better urgency models and the absence of a transplant-related benefit model remain. Most studies emphasized utility, focusing on survival outcomes. Future research should address the interpretability and generalizability of these models to improve organ allocation and post-LT survival predictions.
Collapse
Affiliation(s)
- Lisiane Pruinelli
- Department of Family, Community and Health Systems Science, University of Florida, Gainesville, Florida, US.
- Department of Surgery, University of Florida, Gainesville, Florida, US.
| | - Kiruthika Balakrishnan
- Department of Family, Community and Health Systems Science, University of Florida, Gainesville, Florida, US
| | - Sisi Ma
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
- Division of General Internal Medicine, University of Minnesota, Minneapolis, Minnesota, USA
| | - Zhigang Li
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| | - Anji Wall
- Baylor University Medical Center in Dallas, Dallas, Texas, USA
| | - Jennifer C Lai
- Department of Medicine, University of California, San Francisco, California, USA
| | - Jesse D Schold
- Departments of Surgery and Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Timothy Pruett
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, US
| | - Gyorgy Simon
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
6
|
Abulibdeh R, Tu K, Butt DA, Train A, Crampton N, Sejdić E. Assessing the capture of sociodemographic information in electronic medical records to inform clinical decision making. PLoS One 2025; 20:e0317599. [PMID: 39823404 PMCID: PMC11741650 DOI: 10.1371/journal.pone.0317599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 01/01/2025] [Indexed: 01/19/2025] Open
Abstract
There is a growing need to document sociodemographic factors in electronic medical records to produce representative cohorts for medical research and to perform focused research for potentially vulnerable populations. The objective of this work was to assess the content of family physicians' electronic medical records and characterize the quality of the documentation of sociodemographic characteristics. Descriptive statistics were reported for each sociodemographic characteristic. The association between the completeness rates of the sociodemographic data and the various clinics, electronic medical record vendors, and physician characteristics was analyzed. Supervised machine learning models were used to determine the absence or presence of each characteristic for all adult patients over the age of 18 in the database. Documentation of marital status (51.0%) and occupation (47.2%) were significantly higher compared to the rest of the variables. Race (1.4%), sexual orientation (2.5%), and gender identity (0.8%) had the lowest documentation rates with a 97.5% missingness rate or higher. The correlation analysis for vendor type demonstrated that there was significant variation in the availability of marital and occupation information between vendors (χ2 > 6.0, P < 0.05). Variability in documentation between clinics indicated that the majority of characteristics exhibited high variation in completeness rates with the highest variation for occupation (median: 47.2, interquartile range: 60.6%) and marital status (median: 45.6, interquartile: 59.7%). Finally, physician sex, years since a physician graduated, and whether a physician was a foreign vs a Canadian medical graduate were significantly associated with documentation rates of place of birth, citizenship status, occupation, and education in the electronic medical records. Our findings suggest a crucial need to implement better documentation strategies for sociodemographic information in the healthcare setting. To improve completeness rates, healthcare systems should monitor, encourage, enforce, or incentivize sociodemographic data collection standards.
Collapse
Affiliation(s)
- Rawan Abulibdeh
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
- North York General Hospital, Toronto, Ontario, Canada
- Toronto Western Hospital Family Health Team, University Health Network, Toronto, Ontario, Canada
| | - Debra A. Butt
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Family and Community Medicine, Scarborough Health Network, Scarborough, Ontario, Canada
| | - Anthony Train
- Department of Family Medicine, Queen’s University, Kingston, Ontario, Canada
| | - Noah Crampton
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Ervin Sejdić
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
- North York General Hospital, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Tamuhla T, Coussens AK, Abrahams M, Blumenthal MJ, Lakay F, Wilkinson RJ, Riou C, Raubenheimer P, Dave JA, Tiffin N. Implementation of a genotyped African population cohort, with virtual follow-up: A feasibility study in the Western Cape Province, South Africa. Wellcome Open Res 2025; 9:620. [PMID: 39925651 PMCID: PMC11806245 DOI: 10.12688/wellcomeopenres.23009.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2025] [Indexed: 02/11/2025] Open
Abstract
Background There is limited knowledge regarding African genetic drivers of disease due to prohibitive costs of large-scale genomic research in Africa. Methods We piloted a scalable virtual genotyped cohort in South Africa that was affordable in this resource-limited context, cost-effective, scalable virtual genotyped cohort in South Africa, with participant recruitment using a tiered informed consent model and DNA collection by buccal swab. Genotype data was generated using the H3Africa Illumina micro-array, and phenotype data was derived from routine health data of participants. We demonstrated feasibility of nested case control genome wide association studies using these data for phenotypes type 2 diabetes mellitus (T2DM) and severe COVID-19. Results 2267346 variants were analysed in 459 participant samples, of which 229 (66.8%) are female. 78.6% of SNPs and 74% of samples passed quality control (QC). Principal component analysis showed extensive ancestry admixture in study participants. Of the 343 samples that passed QC, 93 participants had T2DM and 63 had severe COVID-19. For 1780 previously published COVID-19-associated variants, 3 SNPs in the pre-imputation data and 23 SNPS in the imputed data were significantly associated with severe COVID-19 cases compared to controls (p<0.05). For 2755 published T2DM associated variants, 69 SNPs in the pre-imputation data and 419 SNPs in the imputed data were significantly associated with T2DM cases when compared to controls (p<0.05). Conclusions The results shown here are illustrative of what will be possible as the cohort expands in the future. Here we demonstrate the feasibility of this approach, recognising that the findings presented here are preliminary and require further validation once we have a sufficient sample size to improve statistical significance of findings.We implemented a genotyped population cohort with virtual follow up data in a resource-constrained African environment, demonstrating feasibility for scale up and novel health discoveries through nested case-control studies.
Collapse
Affiliation(s)
- Tsaone Tamuhla
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
| | - Anna K Coussens
- Wellcome CIDRI-Africa, Faculty of Health Sciences, University of Cape Town, Rondebosch, Western Cape, South Africa
- Infectious Diseases and Immune Defence Division, The Walter and Eliza Hall Institute of Medical Research, Parkville Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Maleeka Abrahams
- Division of Endocrinology, Department of Medicine, University of Cape Town, Rondebosch, Cape Town, South Africa
| | - Melissa J Blumenthal
- International Centre for Genetic Engineering and Biotechnology, Cape Town, South Africa
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Rondebosch, Western Cape, South Africa
| | - Francisco Lakay
- Wellcome CIDRI-Africa, Faculty of Health Sciences, University of Cape Town, Rondebosch, Western Cape, South Africa
| | - Robert J Wilkinson
- Wellcome CIDRI-Africa, Faculty of Health Sciences, University of Cape Town, Rondebosch, Western Cape, South Africa
- The Francis Crick Institute, London, England, UK
- Department of Infectious Diseases, Imperial College London, London, England, UK
| | - Catherine Riou
- Wellcome CIDRI-Africa, Faculty of Health Sciences, University of Cape Town, Rondebosch, Western Cape, South Africa
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Rondebosch, Western Cape, South Africa
- Department of Pathology, University of Cape Town, Rondebosch, Western Cape, South Africa
| | - Peter Raubenheimer
- Division of Endocrinology, Department of Medicine, University of Cape Town, Rondebosch, Cape Town, South Africa
| | - Joel A Dave
- Division of Endocrinology, Department of Medicine, University of Cape Town, Rondebosch, Cape Town, South Africa
| | - Nicki Tiffin
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
- Wellcome CIDRI-Africa, Faculty of Health Sciences, University of Cape Town, Rondebosch, Western Cape, South Africa
| |
Collapse
|
8
|
Forrest N, Tran S, Nandoliya K, Houskamp E, Gruchala T, Guggilla V, Sun Z, Lukas R, Wainwright D, Furmanchuk A, Johnson J, Roy I, Walunas T. A Dynamic Time Warping Extension to Consensus Weight-Based Cachexia Criteria Improves Prediction of Cancer Patient Outcomes. JCSM COMMUNICATIONS 2025; 8:e107. [PMID: 40151817 PMCID: PMC11949122 DOI: 10.1002/rco2.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/22/2024] [Accepted: 08/30/2024] [Indexed: 03/29/2025]
Abstract
Background Cachexia is a complex syndrome that impacts up to half of patients with cancer. Criteria systems have been developed for the purpose of diagnosing and grading cachexia severity in clinical settings. One of the most widely known is those developed by Fearon et al. in 2011, which utilizes body mass loss and body mass index (BMI) to determine the presence and extent of cachexia. One limitation of this system and other clinical cachexia scales is the lack of systematic methods for assessing cachexia severity longitudinally. We sought to develop an extension to the 2011 consensus criteria that categorizes cancer patients with respect to their temporal cachexia progression and assess its predictive capacity relative to the current time-agnostic system. Methods Two cancer cohorts were identified in electronic health record data: lung cancer and glioblastoma. We extracted weight and BMI measures from the time of cancer diagnosis until death or loss to follow-up and computed cachexia severity according to the consensus criteria. Subgroups of cachexia progression were uncovered using dynamic time warping (DTW) followed by unsupervised clustering. This system and baseline consensus criteria measurements were each assessed for their ability to stratify patient outcomes utilizing Kaplan-Meier curves and Cox proportional hazards and subsequently compared with model concordance and inverse probability of censoring weighting (IPCW). Results Significant differences were observed in overall survival Kaplan-Meier curves of 1023 patients with lung cancer when stratified by baseline cachexia classification (p = 0.0002, N events = 592) but not in a cohort of 545 patients with glioblastoma (p = 0.16, N events = 353). DTW uncovered three patterns of cachexia progression in each subgroup with features described as 'smouldering', 'rapid with recovery' or 'persistent/recurrent'. Significant differences were observed in Kaplan-Meier curves when stratified by cachexia longitudinal patterns in lung cancer (p < 0.0001) and glioblastoma (p < 0.0001). Adjusted hazards ratios comparing the 'persistent/recurrent' cluster to referent subgroups in Cox models were 4.8 (4.1-5.8, p < 0.05) and 1.9 (1.4-2.4, p < 0.05) among patients with lung cancer and glioblastoma, respectively. Areas under the curve at multiple time points and Cox model concordances were greater when patients were stratified by progression pattern compared with baseline consensus criteria. Conclusions Our results suggest that accounting for cachexia's longitudinal progression in a systematic way can improve upon the prognostic capacity of a widely used consensus criteria set. These findings are important for the future development of systems that recognize concerning patterns of cachexia progression in clinical settings and aid clinicians in cachexia-related decision making.
Collapse
Affiliation(s)
- Noah Forrest
- Center for Health Information PartnershipsNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Steven Tran
- Center for Health Information PartnershipsNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Khizar R. Nandoliya
- Department of Neurological SurgeryNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Ethan J. Houskamp
- Department of Neurological SurgeryNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Tomasz Gruchala
- Department of Physical Medicine and RehabilitationNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Vijeeth Guggilla
- Center for Health Information PartnershipsNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Zequn Sun
- Department of Preventive MedicineFeinberg School of Medicine, Northwestern UniversityChicagoIllinoisUSA
| | - Rimas Lukas
- Department of NeurologyNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Derek Wainwright
- Department of Cancer Biology, Stritch School of MedicineLoyola University ChicagoMaywoodIllinoisUSA
- Department of Neurological SurgeryLoyola University Medical CenterMaywoodIllinoisUSA
| | - Al'ona Furmanchuk
- Center for Health Information PartnershipsNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Department of MedicineNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Jodi L. Johnson
- Department of Medical Social SciencesNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Department of PathologyNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Department of DermatologyNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Robert H. Lurie Comprehensive Cancer CenterNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Ishan Roy
- Department of Physical Medicine and RehabilitationNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Robert H. Lurie Comprehensive Cancer CenterNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Department of Physical Medicine and RehabilitationShirley Ryan AbilityLabChicagoIllinoisUSA
| | - Theresa L. Walunas
- Center for Health Information PartnershipsNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Department of MedicineNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Department of Medical Social SciencesNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
- Robert H. Lurie Comprehensive Cancer CenterNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| |
Collapse
|
9
|
Empey PE, Karnes JH, Johnson JA. Pharmacogenetics: Opportunities for the All of Us Research Program and Other Large Data Sets to Advance the Field. Annu Rev Pharmacol Toxicol 2025; 65:111-130. [PMID: 39847465 DOI: 10.1146/annurev-pharmtox-061724-080718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Pharmacogenetic variation is common and an established driver of response for many drugs. There has been tremendous progress in pharmacogenetics knowledge over the last 30 years and in clinical implementation of that knowledge over the last 15 years. But there have also been many examples where translation has stalled because of the lack of available data sets for discovery or validation research. The recent availability of data from very large cohorts with linked genetic, electronic health record, and other data promises new opportunities to advance pharmacogenetics research. This review presents the stages from pharmacogenetics discovery to widespread clinical adoption using prominent gene-drug pairs that have been implemented into clinical practice as examples. We discuss the opportunities that the All of Us Research Program and other large biorepositories with genomic and linked electronic health record data present in advancing and accelerating the translation of pharmacogenetics into clinical practice.
Collapse
Affiliation(s)
- Philip E Empey
- Center for Clinical Pharmaceutical Sciences, School of Pharmacy; and Institute for Precision Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA;
| | - Jason H Karnes
- Department of Pharmacy Practice and Science, R. Ken Coit College of Pharmacy, University of Arizona, Tucson, Arizona, USA
| | - Julie A Johnson
- Clinical and Translational Science Institute, Colleges of Medicine and Pharmacy, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
10
|
Olaker VR, Fry S, Terebuh P, Davis PB, Tisch DJ, Xu R, Miller MG, Dorney I, Palchuk MB, Kaelber DC. With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de-identified electronic health record data for research. Clin Transl Sci 2025; 18:e70093. [PMID: 39740190 DOI: 10.1111/cts.70093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Revised: 10/31/2024] [Accepted: 11/04/2024] [Indexed: 01/02/2025] Open
Abstract
Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de-identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data.
Collapse
Affiliation(s)
- Veronica R Olaker
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Sarah Fry
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Pauline Terebuh
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Pamela B Davis
- Center for Community Health Integration, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Daniel J Tisch
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Margaret G Miller
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Ian Dorney
- The Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA
| | | | - David C Kaelber
- The Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA
- The Departments of Internal Medicine, Pediatrics and Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
11
|
Li M, Li X, Pan K, Geva A, Yang D, Sweet SM, Bonzel CL, Ayakulangara Panickan V, Xiong X, Mandl K, Cai T. Multisource representation learning for pediatric knowledge extraction from electronic health records. NPJ Digit Med 2024; 7:319. [PMID: 39533050 PMCID: PMC11558010 DOI: 10.1038/s41746-024-01320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
Electronic Health Record (EHR) systems are particularly valuable in pediatrics due to high barriers in clinical studies, but pediatric EHR data often suffer from low content density. Existing EHR code embeddings tailored for the general patient population fail to address the unique needs of pediatric patients. To bridge this gap, we introduce a transfer learning approach, MUltisource Graph Synthesis (MUGS), aimed at accurate knowledge extraction and relation detection in pediatric contexts. MUGS integrates graphical data from both pediatric and general EHR systems, along with hierarchical medical ontologies, to create embeddings that adaptively capture both the homogeneity and heterogeneity between hospital systems. These embeddings enable refined EHR feature engineering and nuanced patient profiling, proving particularly effective in identifying pediatric patients similar to specific profiles, with a focus on pulmonary hypertension (PH). MUGS embeddings, resistant to negative transfer, outperform other benchmark methods in multiple applications, advancing evidence-based pediatric research.
Collapse
Affiliation(s)
- Mengyan Li
- Department of Mathematical Sciences, Bentley University, Waltham, MA, USA
| | - Xiaoou Li
- School of Statistics, University of Minnesota, Minneapolis, MN, USA
| | - Kevin Pan
- Mission San Jose High School, Fremont, CA, USA
| | - Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, USA
- Department of Anaesthesia, Harvard Medical School, Boston, MA, USA
| | - Doris Yang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sara Morini Sweet
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Xin Xiong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kenneth Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
12
|
Merritt VC, Chen AW, Bonzel CL, Hong C, Sangar R, Morini Sweet S, Sorg SF, Chanfreau-Coffinier C. Development and validation of an electronic health record-based algorithm for identifying TBI in the VA: A VA Million Veteran Program study. Brain Inj 2024; 38:1084-1092. [PMID: 39004925 DOI: 10.1080/02699052.2024.2373920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 06/12/2024] [Accepted: 06/24/2024] [Indexed: 07/16/2024]
Abstract
The purpose of this study was to develop and validate an algorithm for identifying Veterans with a history of traumatic brain injury (TBI) in the Veterans Affairs (VA) electronic health record using VA Million Veteran Program (MVP) data. Manual chart review (n = 200) was first used to establish 'gold standard' diagnosis labels for TBI ('Yes TBI' vs. 'No TBI'). To develop our algorithm, we used PheCAP, a semi-supervised pipeline that relied on the chart review diagnosis labels to train and create a prediction model for TBI. Cross-validation was used to train and evaluate the proposed algorithm, 'TBI-PheCAP.' TBI-PheCAP performance was compared to existing TBI algorithms and phenotyping methods, and the final algorithm was run on all MVP participants (n = 702,740) to assign a predicted probability for TBI and a binary classification status choosing specificity = 90%. The TBI-PheCAP algorithm had an area under the receiver operating characteristic curve of 0.92, sensitivity of 84%, and positive predictive value (PPV) of 98% at specificity = 90%. TBI-PheCAP generally performed better than other classification methods, with equivalent or higher sensitivity and PPV than existing rules-based TBI algorithms and MVP TBI-related survey data. Given its strong classification metrics, the TBI-PheCAP algorithm is recommended for use in future population-based TBI research.
Collapse
Affiliation(s)
- Victoria C Merritt
- VA San Diego Healthcare System (VASDHS), San Diego, CA, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Center of Excellence for Stress and Mental Health, VASDHS, San Diego, CA, USA
| | | | | | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NH, USA
| | | | | | - Scott F Sorg
- Home Base, A Red Sox Foundation and Massachusetts General Hospital Program, Boston, MA, USA
| | | |
Collapse
|
13
|
Gabel E, Gal J, Grogan T, Hofer I. A retrospective analysis using comorbidity detecting algorithmic software to determine the incidence of International Classification of Diseases (ICD) code omissions and appropriateness of Diagnosis-Related Group (DRG) code modifiers. BMC Med Inform Decis Mak 2024; 24:309. [PMID: 39443922 PMCID: PMC11520144 DOI: 10.1186/s12911-024-02724-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/15/2024] [Indexed: 10/25/2024] Open
Abstract
BACKGROUND The mechanism for recording International Classification of Diseases (ICD) and diagnosis related groups (DRG) codes in a patient's chart is through a certified medical coder who manually reviews the medical record at the completion of an admission. High-acuity ICD codes justify DRG modifiers, indicating the need for escalated hospital resources. In this manuscript, we demonstrate that value of rules-based computer algorithms that audit for omission of administrative codes and quantifying the downstream effects with regard to financial impacts and demographic findings did not indicate significant disparities. METHODS All study data were acquired via the UCLA Department of Anesthesiology and Perioperative Medicine's Perioperative Data Warehouse. The DataMart is a structured reporting schema that contains all the relevant clinical data entered into the EPIC (EPIC Systems, Verona, WI) electronic health record. Computer algorithms were created for eighteen disease states that met criteria for DRG modifiers. Each algorithm was run against all hospital admissions with completed billing from 2019. The algorithms scanned for the existence of disease, appropriate ICD coding, and DRG modifier appropriateness. Secondarily, the potential financial impact of ICD omissions was estimated by payor class and an analysis of ICD miscoding was done by ethnicity, sex, age, and financial class. RESULTS Data from 34,104 hospital admissions were analyzed from January 1, 2019, to December 31, 2019. 11,520 (32.9%) hospital admissions were algorithm positive for a disease state with no corresponding ICD code. 1,990 (5.8%) admissions were potentially eligible for DRG modification/upgrade with an estimated lost revenue of $22,680,584.50. ICD code omission rates compared against reference groups (private payors, Caucasians, middle-aged patients) demonstrated significant p-values < 0.05; similarly significant p-value where demonstrated when comparing patients of opposite sexes. CONCLUSIONS We successfully used rules-based algorithms and raw structured EHR data to identify omitted ICD codes from inpatient medical record claims. These missing ICD codes often had downstream effects such as inaccurate DRG modifiers and missed reimbursement. Embedding augmented intelligence into this problematic workflow has the potential for improvements in administrative data, but more importantly, improvements in administrative data accuracy and financial outcomes.
Collapse
Affiliation(s)
- Eilon Gabel
- University of California at Los Angeles David Geffen School of Medicine, Los Angeles, CA, USA.
| | - Jonathan Gal
- Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | - Tristan Grogan
- University of California at Los Angeles David Geffen School of Medicine, Los Angeles, CA, USA
| | - Ira Hofer
- Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| |
Collapse
|
14
|
Jacobs BM, Stow D, Hodgson S, Zöllner J, Samuel M, Kanoni S, Bidi S, Walter K, Langenberg C, Dobson R, Finer S, Morton C, Siddiqui MK, Martin HC, Pietzner M, Mathur R, van Heel DA. Genetic architecture of routinely acquired blood tests in a British South Asian cohort. Nat Commun 2024; 15:8929. [PMID: 39414775 PMCID: PMC11484750 DOI: 10.1038/s41467-024-53091-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 09/30/2024] [Indexed: 10/18/2024] Open
Abstract
Understanding the genetic basis of routinely-acquired blood tests can provide insights into several aspects of human physiology. We report a genome-wide association study of 42 quantitative blood test traits defined using Electronic Healthcare Records (EHRs) of ~50,000 British Bangladeshi and British Pakistani adults. We demonstrate a causal variant within the PIEZO1 locus which was associated with alterations in red cell traits and glycated haemoglobin. Conditional analysis and within-ancestry fine mapping confirmed that this signal is driven by a missense variant - chr16-88716656-G-TT - which is common in South Asian ancestries (MAF 3.9%) but ultra-rare in other ancestries. Carriers of the T allele had lower mean HbA1c values, lower HbA1c values for a given level of random or fasting glucose, and delayed diagnosis of Type 2 Diabetes Mellitus. Our results shed light on the genetic basis of clinically-relevant traits in an under-represented population, and emphasise the importance of ancestral diversity in genetic studies.
Collapse
Grants
- 210561/Z/18/Z Wellcome Trust
- WT102627 Wellcome Trust (Wellcome)
- MR/V028766/1 RCUK | Medical Research Council (MRC)
- Wellcome Trust
- M009017 RCUK | Medical Research Council (MRC)
- Genes & Health is/has recently been core-funded by Wellcome (WT102627, WT210561), the Medical Research Council (UK) (M009017, MR/X009777/1, MR/X009920/1), Higher Education Funding Council for England Catalyst, Barts Charity (845/1796), Health Data Research UK (for London substantive site), and research delivery support from the NHS National Institute for Health Research Clinical Research Network (North Thames). Genes & Health is/has recently been funded by Alnylam Pharmaceuticals, Genomics PLC; and a Life Sciences Industry Consortium of Astra Zeneca PLC, Bristol-Myers Squibb Company, GlaxoSmithKline Research and Development Limited, Maze Therapeutics Inc, Merck Sharp & Dohme LLC, Novo Nordisk A/S, Pfizer Inc, Takeda Development Centre Americas Inc.
Collapse
Affiliation(s)
- Benjamin M Jacobs
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
- Department of Neurology, Royal London Hospital, Barts Health NHS Trust, London, UK
| | - Daniel Stow
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Sam Hodgson
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Julia Zöllner
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
- University College London, London, UK
| | - Miriam Samuel
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Stavroula Kanoni
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Saeed Bidi
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Klaudia Walter
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Claudia Langenberg
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
- Computational Medicine, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Ruth Dobson
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
- Department of Neurology, Royal London Hospital, Barts Health NHS Trust, London, UK
| | - Sarah Finer
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
- Blizard Institute, Queen Mary University of London, London, UK
| | - Caroline Morton
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Moneeza K Siddiqui
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Hilary C Martin
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Maik Pietzner
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
- Computational Medicine, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Rohini Mathur
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - David A van Heel
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK.
- Blizard Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
15
|
Guo J, Guo Q, Zhong T, Xu C, Xia Z, Fang H, Chen Q, Zhou Y, Xie J, Jin D, Yang Y, Wu X, Zhu H, Hour A, Jin X, Zhou Y, Li Q. Phenome-wide association study in 25,639 pregnant Chinese women reveals loci associated with maternal comorbidities and child health. CELL GENOMICS 2024; 4:100632. [PMID: 39389020 PMCID: PMC11602594 DOI: 10.1016/j.xgen.2024.100632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/02/2023] [Accepted: 07/19/2024] [Indexed: 10/12/2024]
Abstract
Phenome-wide association studies (PheWAS) have been less focused on maternal diseases and maternal-newborn comorbidities, especially in the Chinese population. To enhance our understanding of the genetic basis of these related diseases, we conducted a PheWAS on 25,639 pregnant women and 14,151 newborns in the Chinese Han population using ultra-low-coverage whole-genome sequence (ulcWGS). We identified 2,883 maternal trait-associated SNPs associated with 26 phenotypes, among which 99.5% were near established genome-wide association study (GWAS) loci. Further refinement delineated these SNPs to 442 unique trait-associated loci (TALs) predicated on linkage disequilibrium R2 > 0.8, revealing that 75.6% demonstrated pleiotropy and 50.9% were located in genes implicated in analogous phenotypes. Notably, we discovered 21 maternal SNPs associated with 35 neonatal phenotypes, including two SNPs associated with identical complications in both mothers and children. These findings underscore the importance of integrating ulcWGS data to enrich the discoveries derived from traditional PheWAS approaches.
Collapse
Affiliation(s)
- Jintao Guo
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China; Weifang People's Hospital, Shandong Second Medical University, Shandong 261041, China
| | - Qiwei Guo
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Taoling Zhong
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Chaoqun Xu
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Zhongmin Xia
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Hongkun Fang
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Weifang People's Hospital, Shandong Second Medical University, Shandong 261041, China
| | - Qinwei Chen
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Ying Zhou
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Jieqiong Xie
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Dandan Jin
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - You Yang
- BGI-Shenzhen, Shenzhen 518103, China
| | - Xin Wu
- BGI-Shenzhen, Shenzhen 518103, China
| | | | - Ailing Hour
- Department of Life Science, Fu-Jen Catholic University, Xinzhuang Dist., New Taipei City 242, Taiwan
| | - Xin Jin
- BGI-Shenzhen, Shenzhen 518103, China
| | - Yulin Zhou
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China.
| | - Qiyuan Li
- Department of Pediatrics, School of Medicine, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China.
| |
Collapse
|
16
|
Choudhary T, Upadhyaya P, Davis CM, Yang P, Tallowin S, Lisboa FA, Schobel SA, Coopersmith CM, Elster EA, Buchman TG, Dente CJ, Kamaleswaran R. Derivation and validation of generalized sepsis-induced acute respiratory failure phenotypes among critically ill patients: a retrospective study. Crit Care 2024; 28:321. [PMID: 39354616 PMCID: PMC11445942 DOI: 10.1186/s13054-024-05061-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 08/07/2024] [Indexed: 10/03/2024] Open
Abstract
BACKGROUND Septic patients who develop acute respiratory failure (ARF) requiring mechanical ventilation represent a heterogenous subgroup of critically ill patients with widely variable clinical characteristics. Identifying distinct phenotypes of these patients may reveal insights about the broader heterogeneity in the clinical course of sepsis, considering multi-organ dynamics. We aimed to derive novel phenotypes of sepsis-induced ARF using observational clinical data and investigate the generalizability of the derived phenotypes. METHODS We performed a multi-center retrospective study of ICU patients with sepsis who required mechanical ventilation for ≥ 24 h. Data from two different high-volume academic hospital centers were used, where all phenotypes were derived in MICU of Hospital-I (N = 3225). The derived phenotypes were validated in MICU of Hospital-II (N = 848), SICU of Hospital-I (N = 1112), and SICU of Hospital-II (N = 465). Clinical data from 24 h preceding intubation was used to derive distinct phenotypes using an explainable machine learning-based clustering model interpreted by clinical experts. RESULTS Four distinct ARF phenotypes were identified: A (severe multi-organ dysfunction (MOD) with a high likelihood of kidney injury and heart failure), B (severe hypoxemic respiratory failure [median P/F = 123]), C (mild hypoxia [median P/F = 240]), and D (severe MOD with a high likelihood of hepatic injury, coagulopathy, and lactic acidosis). Patients in each phenotype showed differences in clinical course and mortality rates despite similarities in demographics and admission co-morbidities. The phenotypes were reproduced in external validation utilizing the MICU of Hospital-II and SICUs from Hospital-I and -II. Kaplan-Meier analysis showed significant difference in 28-day mortality across the phenotypes (p < 0.01) and consistent across MICU and SICU of both Hospital-I and -II. The phenotypes demonstrated differences in treatment effects associated with high positive end-expiratory pressure (PEEP) strategy. CONCLUSION The phenotypes demonstrated unique patterns of organ injury and differences in clinical outcomes, which may help inform future research and clinical trial design for tailored management strategies.
Collapse
Affiliation(s)
- Tilendra Choudhary
- Department of Surgery, Duke University School of Medicine, Durham, NC, 27707, USA.
| | - Pulakesh Upadhyaya
- Department of Surgery, Duke University School of Medicine, Durham, NC, 27707, USA
| | - Carolyn M Davis
- Department of Surgery, Emory University School of Medicine, Atlanta, GA, 30332, USA
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Philip Yang
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Emory University, Atlanta, GA, 30322, USA
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Simon Tallowin
- Surgical Critical Care Initiative (SC2i), Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA
- Academic Department of Military Surgery and Trauma, Royal Centre for Defence Medicine, Birmingham, UK
| | - Felipe A Lisboa
- Surgical Critical Care Initiative (SC2i), Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA
- Department of Surgery, Uniformed Services University of the Health Sciences and Walter Reed National Military Medical Center, Bethesda, MD, 20814, USA
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, MD, 20817, USA
| | - Seth A Schobel
- Surgical Critical Care Initiative (SC2i), Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA
- Department of Surgery, Uniformed Services University of the Health Sciences and Walter Reed National Military Medical Center, Bethesda, MD, 20814, USA
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, MD, 20817, USA
| | - Craig M Coopersmith
- Department of Surgery, Emory University School of Medicine, Atlanta, GA, 30332, USA
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Eric A Elster
- Surgical Critical Care Initiative (SC2i), Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA
- Department of Surgery, Uniformed Services University of the Health Sciences and Walter Reed National Military Medical Center, Bethesda, MD, 20814, USA
| | - Timothy G Buchman
- Department of Surgery, Emory University School of Medicine, Atlanta, GA, 30332, USA
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Christopher J Dente
- Department of Surgery, Emory University School of Medicine, Atlanta, GA, 30332, USA
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Rishikesan Kamaleswaran
- Department of Surgery, Duke University School of Medicine, Durham, NC, 27707, USA.
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA.
| |
Collapse
|
17
|
Acharya A, Shrestha S, Chen A, Conte J, Avramovic S, Sikdar S, Anastasopoulos A, Das S. Clinical risk prediction using language models: benefits and considerations. J Am Med Inform Assoc 2024; 31:1856-1864. [PMID: 38412328 PMCID: PMC11339498 DOI: 10.1093/jamia/ocae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/11/2024] [Accepted: 02/03/2024] [Indexed: 02/29/2024] Open
Abstract
OBJECTIVE The use of electronic health records (EHRs) for clinical risk prediction is on the rise. However, in many practical settings, the limited availability of task-specific EHR data can restrict the application of standard machine learning pipelines. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks. METHODS We propose two novel LM-based methods, namely "LLaMA2-EHR" and "Sent-e-Med." Our focus is on utilizing the textual descriptions within structured EHRs to make risk predictions about future diagnoses. We conduct a comprehensive comparison with previous approaches across various data types and sizes. RESULTS Experiments across 6 different methods and 3 separate risk prediction tasks reveal that employing LMs to represent structured EHRs, such as diagnostic histories, results in significant performance improvements when evaluated using standard metrics such as area under the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Additionally, they offer benefits such as few-shot learning, the ability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. However, it is noteworthy that outcomes may exhibit sensitivity to a specific prompt. CONCLUSION LMs encompass extensive embedded knowledge, making them valuable for the analysis of EHRs in the context of risk prediction. Nevertheless, it is important to exercise caution in their application, as ongoing safety concerns related to LMs persist and require continuous consideration.
Collapse
Affiliation(s)
| | | | - Anyi Chen
- Staten Island Performing Provider System, Staten Island, NY, United States
| | - Joseph Conte
- Staten Island Performing Provider System, Staten Island, NY, United States
| | | | | | | | - Sanmay Das
- George Mason University, Fairfax, VA, United States
| |
Collapse
|
18
|
Ghadi YY, Mazhar T, Shahzad T, Amir Khan M, Abd-Alrazaq A, Ahmed A, Hamam H. The role of blockchain to secure internet of medical things. Sci Rep 2024; 14:18422. [PMID: 39117650 PMCID: PMC11310483 DOI: 10.1038/s41598-024-68529-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 07/24/2024] [Indexed: 08/10/2024] Open
Abstract
This study explores integrating blockchain technology into the Internet of Medical Things (IoMT) to address security and privacy challenges. Blockchain's transparency, confidentiality, and decentralization offer significant potential benefits in the healthcare domain. The research examines various blockchain components, layers, and protocols, highlighting their role in IoMT. It also explores IoMT applications, security challenges, and methods for integrating blockchain to enhance security. Blockchain integration can be vital in securing and managing this data while preserving patient privacy. It also opens up new possibilities in healthcare, medical research, and data management. The results provide a practical approach to handling a large amount of data from IoMT devices. This strategy makes effective use of data resource fragmentation and encryption techniques. It is essential to have well-defined standards and norms, especially in the healthcare sector, where upholding safety and protecting the confidentiality of information are critical. These results illustrate that it is essential to follow standards like HIPAA, and blockchain technology can help ensure these criteria are met. Furthermore, the study explores the potential benefits of blockchain technology for enhancing inter-system communication in the healthcare industry while maintaining patient privacy protection. The results highlight the effectiveness of blockchain's consistency and cryptographic techniques in combining identity management and healthcare data protection, protecting patient privacy and data integrity. Blockchain is an unchangeable distributed ledger system. In short, the paper provides important insights into how blockchain technology may transform the healthcare industry by effectively addressing significant challenges and generating legal, safe, and interoperable solutions. Researchers, doctors, and graduate students are the audience for our paper.
Collapse
Affiliation(s)
- Yazeed Yasin Ghadi
- Department of Computer Science and Software Engineering, Al Ain University, Abu Dhabi, 15322, UAE
| | - Tehseen Mazhar
- Department of Computer Science, Virtual University of Pakistan, Lahore, 55150, Pakistan.
| | - Tariq Shahzad
- Department of Computer Science, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, 57000, Pakistan
| | - Muhammad Amir Khan
- School of Computing Sciences, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
| | - Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Arfan Ahmed
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar.
| | - Habib Hamam
- Faculty of Engineering, Université de Moncton, Moncton, NB, E1A3E9, Canada
- School of Electrical Engineering, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, 2006, South Africa
- Hodmas University College, Taleh Area, Mogadishu, Somalia
- Bridges for Academic Excellence, Tunis, Tunisia
| |
Collapse
|
19
|
Jafari E, Blackman MH, Karnes JH, Van Driest SL, Crawford DC, Choi L, McDonough CW. Using electronic health records for clinical pharmacology research: Challenges and considerations. Clin Transl Sci 2024; 17:e13871. [PMID: 38943244 PMCID: PMC11213823 DOI: 10.1111/cts.13871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 05/24/2024] [Indexed: 07/01/2024] Open
Abstract
Electronic health records (EHRs) contain a vast array of phenotypic data on large numbers of individuals, often collected over decades. Due to the wealth of information, EHR data have emerged as a powerful resource to make first discoveries and identify disparities in our healthcare system. While the number of EHR-based studies has exploded in recent years, most of these studies are directed at associations with disease rather than pharmacotherapeutic outcomes, such as drug response or adverse drug reactions. This is largely due to challenges specific to deriving drug-related phenotypes from the EHR. There is great potential for EHR-based discovery in clinical pharmacology research, and there is a critical need to address specific challenges related to accurate and reproducible derivation of drug-related phenotypes from the EHR. This review provides a detailed evaluation of challenges and considerations for deriving drug-related data from EHRs. We provide an examination of EHR-based computable phenotypes and discuss cutting-edge approaches to map medication information for clinical pharmacology research, including medication-based computable phenotypes and natural language processing. We also discuss additional considerations such as data structure, heterogeneity and missing data, rare phenotypes, and diversity within the EHR. By further understanding the complexities associated with conducting clinical pharmacology research using EHR-based data, investigators will be better equipped to design thoughtful studies with more reproducible results. Progress in utilizing EHRs for clinical pharmacology research should lead to significant advances in our ability to understand differential drug response and predict adverse drug reactions.
Collapse
Affiliation(s)
- Eissa Jafari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
- Department of Pharmacy Practice, College of PharmacyJazan UniversityJazanSaudi Arabia
| | - Marisa H. Blackman
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jason H. Karnes
- Department of Pharmacy Practice and ScienceUniversity of Arizona R. Ken Coit College of PharmacyTucsonArizonaUSA
| | - Sara L. Van Driest
- Department of PediatricsVanderbilt University Medical Center (VUMC)NashvilleTennesseeUSA
- Present address:
All of US Research Program, National Institutes of HealthBethesdaMarylandUSA
| | - Dana C. Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
- Department of Genetics and Genome Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
| | - Leena Choi
- Department of Biostatistics and Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Caitrin W. McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
20
|
Chen JS, Copado IA, Vallejos C, Kalaw FGP, Soe P, Cai CX, Toy BC, Borkar D, Sun CQ, Shantha JG, Baxter SL. Variations in Electronic Health Record-Based Definitions of Diabetic Retinopathy Cohorts: A Literature Review and Quantitative Analysis. OPHTHALMOLOGY SCIENCE 2024; 4:100468. [PMID: 38560278 PMCID: PMC10973665 DOI: 10.1016/j.xops.2024.100468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/04/2024] [Accepted: 01/11/2024] [Indexed: 04/04/2024]
Abstract
Purpose Use of the electronic health record (EHR) has motivated the need for data standardization. A gap in knowledge exists regarding variations in existing terminologies for defining diabetic retinopathy (DR) cohorts. This study aimed to review the literature and analyze variations regarding codified definitions of DR. Design Literature review and quantitative analysis. Subjects Published manuscripts. Methods Four graders reviewed PubMed and Google Scholar for peer-reviewed studies. Studies were included if they used codified definitions of DR (e.g., billing codes). Data elements such as author names, publication year, purpose, data set type, and DR definitions were manually extracted. Each study was reviewed by ≥ 2 authors to validate inclusion eligibility. Quantitative analyses of the codified definitions were then performed to characterize the variation between DR cohort definitions. Main Outcome Measures Number of studies included and numeric counts of billing codes used to define codified cohorts. Results In total, 43 studies met the inclusion criteria. Half of the included studies used datasets based on structured EHR data (i.e., data registries, institutional EHR review), and half used claims data. All but 1 of the studies used billing codes such as the International Classification of Diseases 9th or 10th edition (ICD-9 or ICD-10), either alone or in addition to another terminology for defining disease. Of the 27 included studies that used ICD-9 and the 20 studies that used ICD-10 codes, the most common codes used pertained to the full spectrum of DR severity. Diabetic retinopathy complications (e.g., vitreous hemorrhage) were also used to define some DR cohorts. Conclusions Substantial variations exist among codified definitions for DR cohorts within retrospective studies. Variable definitions may limit generalizability and reproducibility of retrospective studies. More work is needed to standardize disease cohorts. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Jimmy S. Chen
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Ivan A. Copado
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Cecilia Vallejos
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Fritz Gerald P. Kalaw
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Priyanka Soe
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Cindy X. Cai
- Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Brian C. Toy
- Department of Ophthalmology, Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Durga Borkar
- Department of Ophthalmology, Duke Eye Center, Duke University, Durham, North Carolina
| | - Catherine Q. Sun
- F.I. Proctor Foundation, University of California San Francisco, San Francisco, California
- Department of Ophthalmology, University of California San Francisco, San Francisco, California
| | - Jessica G. Shantha
- F.I. Proctor Foundation, University of California San Francisco, San Francisco, California
- Department of Ophthalmology, University of California San Francisco, San Francisco, California
| | - Sally L. Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| |
Collapse
|
21
|
Miller M, Jorm L, Partyka C, Burns B, Habig K, Oh C, Immens S, Ballard N, Gallego B. Identifying prehospital trauma patients from ambulance patient care records; comparing two methods using linked data in New South Wales, Australia. Injury 2024; 55:111570. [PMID: 38664086 DOI: 10.1016/j.injury.2024.111570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/11/2024] [Accepted: 04/14/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Linked datasets for trauma system monitoring should ideally follow patients from the prehospital scene to hospital admission and post-discharge. Having a well-defined cohort when using administrative datasets is essential because they must capture the representative population. Unlike hospital electronic health records (EHR), ambulance patient-care records lack access to sources beyond immediate clinical notes. Relying on a limited set of variables to define a study population might result in missed patient inclusion. We aimed to compare two methods of identifying prehospital trauma patients: one using only those documented under a trauma protocol and another incorporating additional data elements from ambulance patient care records. METHODS We analyzed data from six routinely collected administrative datasets from 2015 to 2018, including ambulance patient-care records, aeromedical data, emergency department visits, hospitalizations, rehabilitation outcomes, and death records. Three prehospital trauma cohorts were created: an Extended-T-protocol cohort (patients transported under a trauma protocol and/or patients with prespecified criteria from structured data fields), T-protocol cohort (only patients documented as transported under a trauma protocol) and non-T-protocol (extended-T-protocol population not in the T-protocol cohort). Patient-encounter characteristics, mortality, clinical and post-hospital discharge outcomes were compared. A conservative p-value of 0.01 was considered significant RESULTS: Of 1 038 263 patient-encounters included in the extended-T-population 814 729 (78.5 %) were transported, with 438 893 (53.9 %) documented as a T-protocol patient. Half (49.6 %) of the non-T-protocol sub-cohort had an International Classification of Disease 10th edition injury or external cause code, indicating 79644 missed patients when a T-protocol-only definition was used. The non-T-protocol sub-cohort also identified additional patients with intubation, prehospital blood transfusion and positive eFAST. A higher proportion of non-T protocol patients than T-protocol patients were admitted to the ICU (4.6% vs 3.6 %), ventilated (1.8% vs 1.3 %), received in-hospital transfusion (7.9 vs 6.8 %) or died (1.8% vs 1.3 %). Urgent trauma surgery was similar between groups (1.3% vs 1.4 %). CONCLUSION The extended-T-population definition identified 50 % more admitted patients with an ICD-10-AM code consistent with an injury, including patients with severe trauma. Developing an EHR phenotype incorporating multiple data fields of ambulance-transported trauma patients for use with linked data may avoid missing these patients.
Collapse
Affiliation(s)
- Matthew Miller
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Anesthesia, St George Hospital, Kogarah, NSW 2217 Australia; Centre for Big Data Research in Health at UNSW Sydney, Kensington, NSW 2052, Australia.
| | - Louisa Jorm
- Foundation Director of the Centre for Big Data Research in Health at UNSW Sydney, Kensington 2052, Australia
| | - Chris Partyka
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Emergency Medicine, Royal North Shore Hospital, St Leonards, NSW 2065, Australia
| | - Brian Burns
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Royal North Shore Hospital, St Leonards, NSW 2065, Australia; Faculty of Medicine & Health, University of Sydney, Camperdown, NSW 2050, Australia
| | - Karel Habig
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia
| | - Carissa Oh
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Emergency Medicine, St George Hospital, Kogarah, NSW 2217 Australia
| | - Sam Immens
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia
| | - Neil Ballard
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Paediatric Emergency Medicine, Sydney Children's Hospital, Randwick, NSW 2031, Australia; Department of Emergency Medicine, Royal Prince Alfred Hospital, Camperdown, NSW 2050, Australia
| | - Blanca Gallego
- Clinical analytics and machine learning unit, Centre for Big Data Research in Health at UNSW Sydney, Kensington 2052, Australia
| |
Collapse
|
22
|
Newby D, Taylor N, Joyce DW, Winchester LM. Optimising the use of electronic medical records for large scale research in psychiatry. Transl Psychiatry 2024; 14:232. [PMID: 38824136 PMCID: PMC11144247 DOI: 10.1038/s41398-024-02911-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 06/03/2024] Open
Abstract
The explosion and abundance of digital data could facilitate large-scale research for psychiatry and mental health. Research using so-called "real world data"-such as electronic medical/health records-can be resource-efficient, facilitate rapid hypothesis generation and testing, complement existing evidence (e.g. from trials and evidence-synthesis) and may enable a route to translate evidence into clinically effective, outcomes-driven care for patient populations that may be under-represented. However, the interpretation and processing of real-world data sources is complex because the clinically important 'signal' is often contained in both structured and unstructured (narrative or "free-text") data. Techniques for extracting meaningful information (signal) from unstructured text exist and have advanced the re-use of routinely collected clinical data, but these techniques require cautious evaluation. In this paper, we survey the opportunities, risks and progress made in the use of electronic medical record (real-world) data for psychiatric research.
Collapse
Affiliation(s)
- Danielle Newby
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
| | - Niall Taylor
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Dan W Joyce
- Department of Primary Care and Mental Health and Civic Health, Innovation Labs, Institute of Population Health, University of Liverpool, Liverpool, UK
| | | |
Collapse
|
23
|
Bazemore K, Joo J, Hwang WT, Himes BE. Clarifying Chronic Obstructive Pulmonary Disease Genetic Associations Observed in Biobanks via Mediation Analysis of Smoking. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:499-508. [PMID: 38827081 PMCID: PMC11141825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Varying case definitions of COPD have heterogenous genetic risk profiles, potentially reflective of disease subtypes or classification bias (e.g., smokers more likely to be diagnosed with COPD). To better understand differences in genetic loci associated with ICD-defined versus spirometry-defined COPD we contrasted their GWAS results with those for heavy smoking among 337,138 UK Biobank participants. Overlapping risk loci were found in/near the genes ZEB2, FAM136B, CHRNA3, and CHRNA4, with the CHRNA3 locus shared across all three traits. Mediation analysis to estimate the effects of lead genotyped variants mediated by smoking found significant indirect effects for the FAM136B, CHRNA3, and CHRNA4 loci for both COPD definitions. Adjustment for mediator-outcome confounders modestly attenuated indirect effects, though in the CHRNA4 locus for spirometry-defined COPD the proportion mediated increased an additional 8.47%. Our results suggest that differences between ICD-defined and spirometry-defined COPD associated genetic loci are not a result of smoking biasing classification.
Collapse
Affiliation(s)
- Katrina Bazemore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jaehyun Joo
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wei-Ting Hwang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Blanca E Himes
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
24
|
Mathis M, Steffner KR, Subramanian H, Gill GP, Girardi NI, Bansal S, Bartels K, Khanna AK, Huang J. Overview and Clinical Applications of Artificial Intelligence and Machine Learning in Cardiac Anesthesiology. J Cardiothorac Vasc Anesth 2024; 38:1211-1220. [PMID: 38453558 PMCID: PMC10999327 DOI: 10.1053/j.jvca.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 01/30/2024] [Accepted: 02/05/2024] [Indexed: 03/09/2024]
Abstract
Artificial intelligence- (AI) and machine learning (ML)-based applications are becoming increasingly pervasive in the healthcare setting. This has in turn challenged clinicians, hospital administrators, and health policymakers to understand such technologies and develop frameworks for safe and sustained clinical implementation. Within cardiac anesthesiology, challenges and opportunities for AI/ML to support patient care are presented by the vast amounts of electronic health data, which are collected rapidly, interpreted, and acted upon within the periprocedural area. To address such challenges and opportunities, in this article, the authors review 3 recent applications relevant to cardiac anesthesiology, including depth of anesthesia monitoring, operating room resource optimization, and transthoracic/transesophageal echocardiography, as conceptual examples to explore strengths and limitations of AI/ML within healthcare, and characterize this evolving landscape. Through reviewing such applications, the authors introduce basic AI/ML concepts and methodologies, as well as practical considerations and ethical concerns for initiating and maintaining safe clinical implementation of AI/ML-based algorithms for cardiac anesthesia patient care.
Collapse
Affiliation(s)
- Michael Mathis
- Department of Anesthesiology, University of Michigan Medicine, Ann Arbor, MI
| | - Kirsten R Steffner
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA
| | - Harikesh Subramanian
- Department of Anesthesiology and Perioperative Medicine, University of Pittsburgh, Pittsburgh, PA
| | - George P Gill
- Department of Anesthesiology, Cedars Sinai, Los Angeles, CA
| | | | - Sagar Bansal
- Department of Anesthesiology and Perioperative Medicine, University of Missouri School of Medicine, Columbia, MO
| | - Karsten Bartels
- Department of Anesthesiology, University of Nebraska Medical Center, Omaha, NE
| | - Ashish K Khanna
- Department of Anesthesiology, Section on Critical Care Medicine, School of Medicine, Wake Forest University, Atrium Health Wake Forest Baptist Medical Center, Winston-Salem, NC
| | - Jiapeng Huang
- Department of Anesthesiology and Perioperative Medicine, University of Louisville, Louisville, KY.
| |
Collapse
|
25
|
Cao X, Zhang S, Sha Q. A novel method for multiple phenotype association studies based on genotype and phenotype network. PLoS Genet 2024; 20:e1011245. [PMID: 38728360 PMCID: PMC11111089 DOI: 10.1371/journal.pgen.1011245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 05/22/2024] [Accepted: 03/29/2024] [Indexed: 05/12/2024] Open
Abstract
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
26
|
Choudhary T, Upadhyaya P, Davis CM, Yang P, Tallowin S, Lisboa FA, Schobel SA, Coopersmith CM, Elster EA, Buchman TG, Dente CJ, Kamaleswaran R. Derivation and Validation of Generalized Sepsis-induced Acute Respiratory Failure Phenotypes Among Critically Ill Patients: A Retrospective Study. RESEARCH SQUARE 2024:rs.3.rs-4307475. [PMID: 38746442 PMCID: PMC11092838 DOI: 10.21203/rs.3.rs-4307475/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background Septic patients who develop acute respiratory failure (ARF) requiring mechanical ventilation represent a heterogenous subgroup of critically ill patients with widely variable clinical characteristics. Identifying distinct phenotypes of these patients may reveal insights about the broader heterogeneity in the clinical course of sepsis. We aimed to derive novel phenotypes of sepsis-induced ARF using observational clinical data and investigate their generalizability across multi-ICU specialties, considering multi-organ dynamics. Methods We performed a multi-center retrospective study of ICU patients with sepsis who required mechanical ventilation for ≥24 hours. Data from two different high-volume academic hospital systems were used as a derivation set with N=3,225 medical ICU (MICU) patients and a validation set with N=848 MICU patients. For the multi-ICU validation, we utilized retrospective data from two surgical ICUs at the same hospitals (N=1,577). Clinical data from 24 hours preceding intubation was used to derive distinct phenotypes using an explainable machine learning-based clustering model interpreted by clinical experts. Results Four distinct ARF phenotypes were identified: A (severe multi-organ dysfunction (MOD) with a high likelihood of kidney injury and heart failure), B (severe hypoxemic respiratory failure [median P/F=123]), C (mild hypoxia [median P/F=240]), and D (severe MOD with a high likelihood of hepatic injury, coagulopathy, and lactic acidosis). Patients in each phenotype showed differences in clinical course and mortality rates despite similarities in demographics and admission co-morbidities. The phenotypes were reproduced in external validation utilizing an external MICU from second hospital and SICUs from both centers. Kaplan-Meier analysis showed significant difference in 28-day mortality across the phenotypes (p<0.01) and consistent across both centers. The phenotypes demonstrated differences in treatment effects associated with high positive end-expiratory pressure (PEEP) strategy. Conclusion The phenotypes demonstrated unique patterns of organ injury and differences in clinical outcomes, which may help inform future research and clinical trial design for tailored management strategies.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Eric A Elster
- Uniformed Services University of the Health Sciences
| | | | | | | |
Collapse
|
27
|
Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, Modave F. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep 2024; 14:7831. [PMID: 38570569 PMCID: PMC10991582 DOI: 10.1038/s41598-024-58299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/27/2024] [Indexed: 04/05/2024] Open
Abstract
The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
Collapse
Affiliation(s)
- Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA.
| | - Xinsong Du
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes and Policy, University of Florida College of Medicine, Gainesville, FL, 32610, USA
- Biomedical Informatics and Data Science Section, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Braeden Lewis
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Simon Frank
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lauren Wright
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Alex Spirache
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lisa Gonzalez
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Ryan Cheves
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Marina Magalhães
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
| | - Ruben Zapata
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Rahul Reddy
- Department of Computer and Information Science, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Ke Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Leslie Parker
- Department of Biobehavioral Nursing Science, University of Florida College of Nursing, Gainesville, FL, 32603, USA
| | - Chris Harle
- Health Policy and Management Department, Richard M. Fairbanks School of Public Health, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Bridget Young
- Division of Breastfeeding and Lactation Medicine, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Adetola Louis-Jaques
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| | - Bouri Zhang
- Health Science Center Libraries, University of Florida, Gainesville, FL, 32610, USA
| | - Lindsay Thompson
- Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, NC, 27101, USA
| | - William R Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - François Modave
- Department of Anesthesiology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| |
Collapse
|
28
|
Levites Strekalova YA, Wang X, Sanchez O, Midence S. Trends in publication and levels of social determinants of health reporting in Journal of Clinical and Translational Science from 2017 to 2023. J Clin Transl Sci 2024; 8:e58. [PMID: 38655458 PMCID: PMC11036436 DOI: 10.1017/cts.2024.508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/13/2024] [Accepted: 03/19/2024] [Indexed: 04/26/2024] Open
Abstract
Social determinants of health affect clinical and translational research processes and outcomes but remain underreported in empirical studies. This scoping review examined the rate and types of social determinants of health (SDoH) variables included in the JCTS translational research studies published between 2017 and 2023 and included 129 studies. Most papers (91.7%) reported at least one SDoH variable with age, race and ethnicity, and sex included most often. Future studies to inform the role of SDoH data in translational research and science are recommended, and a draft SDoH data checklist is provided.
Collapse
Affiliation(s)
- Yulia A. Levites Strekalova
- Department of Health Services Research, Management and Policy, College of
Public Health and Health Professions, University of Florida,
Gainesville, FL, USA
- Clinical and Translational Science Institute, University of
Florida, Gainesville, FL, USA
| | - Xiangren Wang
- Department of Health Services Research, Management and Policy, College of
Public Health and Health Professions, University of Florida,
Gainesville, FL, USA
| | - Orlando Sanchez
- Clinical and Translational Science Institute, University of
Florida, Gainesville, FL, USA
| | - Sara Midence
- Department of Health Services Research, Management and Policy, College of
Public Health and Health Professions, University of Florida,
Gainesville, FL, USA
| |
Collapse
|
29
|
Clarke H, Fitzcharles MA. Are Electronic Health Records Sufficiently Accurate to Phenotype Rheumatology Patients With Chronic Pain? J Rheumatol 2024; 51:218-220. [PMID: 38224990 DOI: 10.3899/jrheum.2023-1227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Affiliation(s)
- Hance Clarke
- H. Clarke, MD, PhD, Department of Anesthesiology and Pain Medicine, University of Toronto, Department of Anesthesia and Pain Management, Pain Research Unit, Toronto General Hospital, and Transitional Pain Service, Toronto General Hospital, Toronto, Ontario
| | - Mary-Ann Fitzcharles
- M.A. Fitzcharles, MB ChB, Department of Rheumatology, McGill University, Montreal, and Alan Edwards Pain Management Unit, McGill University, Montreal, Canada.
| |
Collapse
|
30
|
Al-Sahab B, Leviton A, Loddenkemper T, Paneth N, Zhang B. Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:121-139. [PMID: 38273982 PMCID: PMC10805748 DOI: 10.1007/s41666-023-00153-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 01/27/2024]
Abstract
Electronic Health Records (EHR) are increasingly being perceived as a unique source of data for clinical research as they provide unprecedentedly large volumes of real-time data from real-world settings. In this review of the secondary uses of EHR, we identify the anticipated breadth of opportunities, pointing out the data deficiencies and potential biases that are likely to limit the search for true causal relationships. This paper provides a comprehensive overview of the types of biases that arise along the pathways that generate real-world evidence and the sources of these biases. We distinguish between two levels in the production of EHR data where biases are likely to arise: (i) at the healthcare system level, where the principal source of bias resides in access to, and provision of, medical care, and in the acquisition and documentation of medical and administrative data; and (ii) at the research level, where biases arise from the processes of extracting, analyzing, and interpreting these data. Due to the plethora of biases, mainly in the form of selection and information bias, we conclude with advising extreme caution about making causal inferences based on secondary uses of EHRs.
Collapse
Affiliation(s)
- Ban Al-Sahab
- Department of Family Medicine, College of Human Medicine, Michigan State University, B100 Clinical Center, 788 Service Road, East Lansing, MI USA
| | - Alan Leviton
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Tobias Loddenkemper
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Nigel Paneth
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI USA
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, East Lansing, MI USA
| | - Bo Zhang
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
- Biostatistics and Research Design, Institutional Centers of Clinical and Translational Research, Boston Children’s Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| |
Collapse
|
31
|
Kashkoush J, Gupta M, Meissner MA, Nielsen ME, Kirchner HL, Garg T. Performance Characteristics of a Rule-Based Electronic Health Record Algorithm to Identify Patients with Gross and Microscopic Hematuria. Methods Inf Med 2023; 62:183-192. [PMID: 37666279 DOI: 10.1055/a-2165-5552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
BACKGROUND Two million patients per year are referred to urologists for hematuria, or blood in the urine. The American Urological Association recently adopted a risk-stratified hematuria evaluation guideline to limit multi-phase computed tomography to individuals at highest risk of occult malignancy. OBJECTIVES To understand population-level hematuria evaluations, we developed an algorithm to accurately identify hematuria cases from electronic health records (EHRs). METHODS We used International Classification of Diseases (ICD)-9/ICD-10 diagnosis codes, urine color, and urine microscopy values to identify hematuria cases and to differentiate between gross and microscopic hematuria. Using an iterative process, we refined the ICD-9 algorithm on a gold standard, chart-reviewed cohort of 3,094 hematuria cases, and the ICD-10 algorithm on a 300 patient cohort. We applied the algorithm to Geisinger patients ≥35 years (n = 539,516) and determined performance by conducting chart review (n = 500). RESULTS After applying the hematuria algorithm, we identified 51,500 hematuria cases and 488,016 clean controls. Of the hematuria cases, 11,435 were categorized as gross, 26,658 as microscopic, 12,562 as indeterminate, and 845 were uncategorized. The positive predictive value (PPV) of identifying hematuria cases using the algorithm was 100% and the negative predictive value (NPV) was 99%. The gross hematuria algorithm had a PPV of 100% and NPV of 99%. The microscopic hematuria algorithm had lower PPV of 78% and NPV of 100%. CONCLUSION We developed an algorithm utilizing diagnosis codes and urine laboratory values to accurately identify hematuria and categorize as gross or microscopic in EHRs. Applying the algorithm will help researchers to understand patterns of care for this common condition.
Collapse
Affiliation(s)
- Jasmine Kashkoush
- Department of Urology, Geisinger, Danville, Pennsylvania, United States
| | - Mudit Gupta
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, Pennsylvania, United States
| | | | - Matthew E Nielsen
- Department of Urology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, United States
- Department of Epidemiology, University of North Carolina at Chapel Hill, Gillings School of Global Public Health, Chapel Hill, North Carolina, United States
- Department of Health Policy & Management, University of North Carolina at Chapel Hill, Gillings School of Global Public Health, Chapel Hill, North Carolina, United States
| | - H Lester Kirchner
- Department of Population Health Sciences, Geisinger, Danville, Pennsylvania, United States
| | - Tullika Garg
- Department of Population Health Sciences, Geisinger, Danville, Pennsylvania, United States
- Department of Urology, Penn State Health Milton S. Hershey Medical Center, Hershey, Pennsylvania, United States
| |
Collapse
|
32
|
Chen Q, Dwaraka VB, Carreras-Gallo N, Mendez K, Chen Y, Begum S, Kachroo P, Prince N, Went H, Mendez T, Lin A, Turner L, Moqri M, Chu SH, Kelly RS, Weiss ST, Rattray NJ, Gladyshev VN, Karlson E, Wheelock C, Mathé EA, Dahlin A, McGeachie MJ, Smith R, Lasky-Su JA. OMICmAge: An integrative multi-omics approach to quantify biological age with electronic medical records. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.16.562114. [PMID: 37904959 PMCID: PMC10614756 DOI: 10.1101/2023.10.16.562114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Biological aging is a multifactorial process involving complex interactions of cellular and biochemical processes that is reflected in omic profiles. Using common clinical laboratory measures in ~30,000 individuals from the MGB-Biobank, we developed a robust, predictive biological aging phenotype, EMRAge, that balances clinical biomarkers with overall mortality risk and can be broadly recapitulated across EMRs. We then applied elastic-net regression to model EMRAge with DNA-methylation (DNAm) and multiple omics, generating DNAmEMRAge and OMICmAge, respectively. Both biomarkers demonstrated strong associations with chronic diseases and mortality that outperform current biomarkers across our discovery (MGB-ABC, n=3,451) and validation (TruDiagnostic, n=12,666) cohorts. Through the use of epigenetic biomarker proxies, OMICmAge has the unique advantage of expanding the predictive search space to include epigenomic, proteomic, metabolomic, and clinical data while distilling this in a measure with DNAm alone, providing opportunities to identify clinically-relevant interconnections central to the aging process.
Collapse
Affiliation(s)
- Qingwen Chen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | - Kevin Mendez
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Yulu Chen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Sofina Begum
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Priyadarshini Kachroo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicole Prince
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | - Aaron Lin
- TruDiagnostic, Inc., Lexington, KY USA
| | | | - Mahdi Moqri
- Division of Genetics, Dept. of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Su H. Chu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Rachel S. Kelly
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicholas J.W Rattray
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
- Strathclyde Centre for Molecular Bioscience, University of Strathclyde, Glasgow, UK
| | - Vadim N. Gladyshev
- Division of Genetics, Dept. of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Elizabeth Karlson
- Department of Personalized Medicine, Mass General Brigham and Harvard Medical School, Boston, MA, USA
| | - Craig Wheelock
- Division of Physiological Chemistry 2, Dept of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, Sweden
| | - Ewy A. Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Science, National Institutes of Health, Rockville, MD, USA
| | - Amber Dahlin
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michae J. McGeachie
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Jessica A. Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
33
|
Nealon CL, Halladay CW, Gorman BR, Simpson P, Roncone DP, Canania RL, Anthony SA, Rogers LRS, Leber JN, Dougherty JM, Bailey JNC, Crawford DC, Sullivan JM, Galor A, Wu WC, Greenberg PB, Million Veteran Program, Lass JH, Iyengar SK, Peachey NS. Association Between Fuchs Endothelial Corneal Dystrophy, Diabetes Mellitus, and Multimorbidity. Cornea 2023; 42:1140-1149. [PMID: 37170406 PMCID: PMC10523841 DOI: 10.1097/ico.0000000000003311] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 04/11/2023] [Indexed: 05/13/2023]
Abstract
PURPOSE The aim of this study was to assess risk for demographic variables and other health conditions that are associated with Fuchs endothelial corneal dystrophy (FECD). METHODS We developed a FECD case-control algorithm based on structured electronic health record data and confirmed accuracy by individual review of charts at 3 Veterans Affairs (VA) Medical Centers. This algorithm was applied to the Department of VA Million Veteran Program cohort from whom sex, genetic ancestry, comorbidities, diagnostic phecodes, and laboratory values were extracted. Single-variable and multiple variable logistic regression models were used to determine the association of these risk factors with FECD diagnosis. RESULTS Being a FECD case was associated with female sex, European genetic ancestry, and a greater number of comorbidities. Of 1417 diagnostic phecodes evaluated, 213 had a significant association with FECD, falling in both ocular and nonocular conditions, including diabetes mellitus (DM). Five of 69 laboratory values were associated with FECD, with the direction of change for 4 being consistent with DM. Insulin dependency and type 1 DM raised risk to a greater degree than type 2 DM, like other microvascular diabetic complications. CONCLUSIONS Female sex, European ancestry, and multimorbidity increased FECD risk. Endocrine/metabolic clinic encounter codes and altered patterns of laboratory values support DM increasing FECD risk. Our results evoke a threshold model in which the FECD phenotype is intensified by DM and potentially other health conditions that alter corneal physiology. Further studies to better understand the relationship between FECD and DM are indicated and may help identify opportunities for slowing FECD progression.
Collapse
Affiliation(s)
- Cari L. Nealon
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Christopher W. Halladay
- Center of Innovation in Long Term Services and Supports, Providence VA Medical Center, Providence, Rhode Island, USA
| | - Bryan R. Gorman
- VA Cooperative Studies Program, VA Boston Healthcare System, Boston, Massachusetts
- Booz Allen Hamilton, McLean, Virginia, USA
| | - Piana Simpson
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - David P. Roncone
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | | | - Scott A. Anthony
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | | | - Jenna N. Leber
- Ophthalmology Section, VA Western NY Health Care System, Buffalo, New York, USA
| | | | - Jessica N. Cooke Bailey
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Dana C. Crawford
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Jack M. Sullivan
- Ophthalmology Section, VA Western NY Health Care System, Buffalo, New York, USA
- Research Service, VA Western NY Health Care System, Buffalo, New York, USA
- Department of Ophthalmology (Ross Eye Institute), University at Buffalo-SUNY, Buffalo, New York, USA
| | - Anat Galor
- Miami Veterans Affairs Medical Center, Miami, Florida, USA
- Bascom Palmer Eye Institute, University of Miami, Miami, Florida, USA
| | - Wen-Chih Wu
- Cardiology Section, Medical Service, Providence VA Medical Center, Providence, Rhode Island, USA
| | - Paul B. Greenberg
- Ophthalmology Section, Providence VA Medical Center, Providence, Rhode Island, USA
- Division of Ophthalmology, Alpert Medical School, Brown University, Providence, Rhode Island, USA
| | | | - Jonathan H. Lass
- Department of Ophthalmology & Visual Sciences, Case Western Reserve University, Cleveland, Ohio, USA
- University Hospitals Eye Institute, Cleveland, Ohio, USA
| | - Sudha K. Iyengar
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Neal S. Peachey
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
- Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, Ohio, USA
- Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
34
|
Sathe NA, Xian S, Mabrey FL, Crosslin DR, Mooney SD, Morrell ED, Lybarger K, Yetisgen M, Jarvik GP, Bhatraju PK, Wurfel MM. Evaluating construct validity of computable acute respiratory distress syndrome definitions in adults hospitalized with COVID-19: an electronic health records based approach. BMC Pulm Med 2023; 23:292. [PMID: 37559024 PMCID: PMC10413524 DOI: 10.1186/s12890-023-02560-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 07/11/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Evolving ARDS epidemiology and management during COVID-19 have prompted calls to reexamine the construct validity of Berlin criteria, which have been rarely evaluated in real-world data. We developed a Berlin ARDS definition (EHR-Berlin) computable in electronic health records (EHR) to (1) assess its construct validity, and (2) assess how expanding its criteria affected validity. METHODS We performed a retrospective cohort study at two tertiary care hospitals with one EHR, among adults hospitalized with COVID-19 February 2020-March 2021. We assessed five candidate definitions for ARDS: the EHR-Berlin definition modeled on Berlin criteria, and four alternatives informed by recent proposals to expand criteria and include patients on high-flow oxygen (EHR-Alternative 1), relax imaging criteria (EHR-Alternatives 2-3), and extend timing windows (EHR-Alternative 4). We evaluated two aspects of construct validity for the EHR-Berlin definition: (1) criterion validity: agreement with manual ARDS classification by experts, available in 175 patients; (2) predictive validity: relationships with hospital mortality, assessed by Pearson r and by area under the receiver operating curve (AUROC). We assessed predictive validity and timing of identification of EHR-Berlin definition compared to alternative definitions. RESULTS Among 765 patients, mean (SD) age was 57 (18) years and 471 (62%) were male. The EHR-Berlin definition classified 171 (22%) patients as ARDS, which had high agreement with manual classification (kappa 0.85), and was associated with mortality (Pearson r = 0.39; AUROC 0.72, 95% CI 0.68, 0.77). In comparison, EHR-Alternative 1 classified 219 (29%) patients as ARDS, maintained similar relationships to mortality (r = 0.40; AUROC 0.74, 95% CI 0.70, 0.79, Delong test P = 0.14), and identified patients earlier in their hospitalization (median 13 vs. 15 h from admission, Wilcoxon signed-rank test P < 0.001). EHR-Alternative 3, which removed imaging criteria, had similar correlation (r = 0.41) but better discrimination for mortality (AUROC 0.76, 95% CI 0.72, 0.80; P = 0.036), and identified patients median 2 h (P < 0.001) from admission. CONCLUSIONS The EHR-Berlin definition can enable ARDS identification with high criterion validity, supporting large-scale study and surveillance. There are opportunities to expand the Berlin criteria that preserve predictive validity and facilitate earlier identification.
Collapse
Affiliation(s)
- Neha A Sathe
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA.
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - F Linzee Mabrey
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - David R Crosslin
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Eric D Morrell
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA, USA
| | - Meliha Yetisgen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Gail P Jarvik
- Department of Genome Sciences and Division of Medical Genetics, Department of Medicine, University of Washington Medical Center, Seattle, WA, USA
| | - Pavan K Bhatraju
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - Mark M Wurfel
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| |
Collapse
|
35
|
Penrod N, Okeh C, Velez Edwards DR, Barnhart K, Senapati S, Verma SS. Leveraging electronic health record data for endometriosis research. Front Digit Health 2023; 5:1150687. [PMID: 37342866 PMCID: PMC10278662 DOI: 10.3389/fdgth.2023.1150687] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/10/2023] [Indexed: 06/23/2023] Open
Abstract
Endometriosis is a chronic, complex disease for which there are vast disparities in diagnosis and treatment between sociodemographic groups. Clinical presentation of endometriosis can vary from asymptomatic disease-often identified during (in)fertility consultations-to dysmenorrhea and debilitating pelvic pain. Because of this complexity, delayed diagnosis (mean time to diagnosis is 1.7-3.6 years) and misdiagnosis is common. Early and accurate diagnosis of endometriosis remains a research priority for patient advocates and healthcare providers. Electronic health records (EHRs) have been widely adopted as a data source in biomedical research. However, they remain a largely untapped source of data for endometriosis research. EHRs capture diverse, real-world patient populations and care trajectories and can be used to learn patterns of underlying risk factors for endometriosis which, in turn, can be used to inform screening guidelines to help clinicians efficiently and effectively recognize and diagnose the disease in all patient populations reducing inequities in care. Here, we provide an overview of the advantages and limitations of using EHR data to study endometriosis. We describe the prevalence of endometriosis observed in diverse populations from multiple healthcare institutions, examples of variables that can be extracted from EHRs to enhance the accuracy of endometriosis prediction, and opportunities to leverage longitudinal EHR data to improve our understanding of long-term health consequences for all patients.
Collapse
Affiliation(s)
- Nadia Penrod
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, United States
| | - Chelsea Okeh
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| | - Digna R. Velez Edwards
- Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, TN, United States
| | - Kurt Barnhart
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Suneeta Senapati
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Shefali S. Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| |
Collapse
|
36
|
Deutsch AJ, Stalbow L, Majarian TD, Mercader JM, Manning AK, Florez JC, Loos RJ, Udler MS. Polygenic Scores Help Reduce Racial Disparities in Predictive Accuracy of Automated Type 1 Diabetes Classification Algorithms. Diabetes Care 2023; 46:794-800. [PMID: 36745605 PMCID: PMC10090893 DOI: 10.2337/dc22-1833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/10/2023] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Automated algorithms to identify individuals with type 1 diabetes using electronic health records are increasingly used in biomedical research. It is not known whether the accuracy of these algorithms differs by self-reported race. We investigated whether polygenic scores improve identification of individuals with type 1 diabetes. RESEARCH DESIGN AND METHODS We investigated two large hospital-based biobanks (Mass General Brigham [MGB] and BioMe) and identified individuals with type 1 diabetes using an established automated algorithm. We performed medical record reviews to validate the diagnosis of type 1 diabetes. We implemented two published polygenic scores for type 1 diabetes (developed in individuals of European or African ancestry). We assessed the classification algorithm before and after incorporating polygenic scores. RESULTS The automated algorithm was more likely to incorrectly assign a diagnosis of type 1 diabetes in self-reported non-White individuals than in self-reported White individuals (odds ratio 3.45; 95% CI 1.54-7.69; P = 0.0026). After incorporating polygenic scores into the MGB Biobank, the positive predictive value of the type 1 diabetes algorithm increased from 70 to 97% for self-reported White individuals (meaning that 97% of those predicted to have type 1 diabetes indeed had type 1 diabetes) and from 53 to 100% for self-reported non-White individuals. Similar results were found in BioMe. CONCLUSIONS Automated phenotyping algorithms may exacerbate health disparities because of an increased risk of misclassification of individuals from underrepresented populations. Polygenic scores may be used to improve the performance of phenotyping algorithms and potentially reduce this disparity.
Collapse
Affiliation(s)
- Aaron J. Deutsch
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Lauren Stalbow
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Timothy D. Majarian
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Josep M. Mercader
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Alisa K. Manning
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA
| | - Jose C. Florez
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Ruth J.F. Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Miriam S. Udler
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| |
Collapse
|
37
|
Master H, Annis J, Huang S, Beckman JA, Ratsimbazafy F, Marginean K, Carroll R, Natarajan K, Harrell FE, Roden DM, Harris P, Brittain EL. Association of step counts over time with the risk of chronic disease in the All of Us Research Program. Nat Med 2022; 28:2301-2308. [PMID: 36216933 PMCID: PMC9671804 DOI: 10.1038/s41591-022-02012-w] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/15/2022] [Indexed: 01/14/2023]
Abstract
The association between physical activity and human disease has not been examined using commercial devices linked to electronic health records. Using the electronic health records data from the All of Us Research Program, we show that step count volumes as captured by participants' own Fitbit devices were associated with risk of chronic disease across the entire human phenome. Of the 6,042 participants included in the study, 73% were female, 84% were white and 71% had a college degree, and participants had a median age of 56.7 (interquartile range 41.5-67.6) years and body mass index of 28.1 (24.3-32.9) kg m-2. Participants walked a median of 7,731.3 (5,866.8-9,826.8) steps per day over the median activity monitoring period of 4.0 (2.2-5.6) years with a total of 5.9 million person-days of monitoring. The relationship between steps per day and incident disease was inverse and linear for obesity (n = 368), sleep apnea (n = 348), gastroesophageal reflux disease (n = 432) and major depressive disorder (n = 467), with values above 8,200 daily steps associated with protection from incident disease. The relationships with incident diabetes (n = 156) and hypertension (n = 482) were nonlinear with no further risk reduction above 8,000-9,000 steps. Although validation in a more diverse sample is needed, these findings provide a real-world evidence-base for clinical guidance regarding activity levels that are necessary to reduce disease risk.
Collapse
Affiliation(s)
- Hiral Master
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jeffrey Annis
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Shi Huang
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Joshua A Beckman
- Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Francis Ratsimbazafy
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kayla Marginean
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Robert Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Karthik Natarajan
- Department Biomedical Informatics, Columbia University, New York, NY, USA
| | - Frank E Harrell
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Dan M Roden
- Department of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Paul Harris
- Department of Biomedical Informatics, Biomedical Engineering and Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Evan L Brittain
- Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
38
|
Roy S, Bruehl S, Feng X, Shotwell MS, Van De Ven T, Shaw AD, Kertai MD. Developing a risk stratification tool for predicting opioid-related respiratory depression after non-cardiac surgery: a retrospective study. BMJ Open 2022; 12:e064089. [PMID: 36219738 PMCID: PMC9445779 DOI: 10.1136/bmjopen-2022-064089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES Accurately assessing the probability of significant respiratory depression following opioid administration can potentially enhance perioperative risk assessment and pain management. We developed and validated a risk prediction tool to estimate the probability of significant respiratory depression (indexed by naloxone administration) in patients undergoing noncardiac surgery. DESIGN Retrospective cohort study. SETTING Single academic centre. PARTICIPANTS We studied n=63 084 patients (mean age 47.1±18.2 years; 50% men) who underwent emergency or elective non-cardiac surgery between 1 January 2007 and 30 October 2017. INTERVENTIONS A derivation subsample reflecting two-thirds of available patients (n=42 082) was randomly selected for model development, and associations were identified between predictor variables and naloxone administration occurring within 5 days following surgery. The resulting probability model for predicting naloxone administration was then cross-validated in a separate validation cohort reflecting the remaining one-third of patients (n=21 002). RESULTS The rate of naloxone administration was identical in the derivation (n=2720 (6.5%)) and validation (n=1360 (6.5%)) cohorts. The risk prediction model identified female sex (OR: 3.01; 95% CI: 2.73 to 3.32), high-risk surgical procedures (OR: 4.16; 95% CI: 3.78 to 4.58), history of drug abuse (OR: 1.81; 95% CI: 1.52 to 2.16) and any opioids being administered on a scheduled rather than as-needed basis (OR: 8.31; 95% CI: 7.26 to 9.51) as risk factors for naloxone administration. Advanced age (OR: 0.971; 95% CI: 0.968 to 0.973), opioids administered via patient-controlled analgesia pump (OR: 0.55; 95% CI: 0.49 to 0.62) and any scheduled non-opioids (OR: 0.63; 95% CI: 0.58 to 0.69) were associated with decreased risk of naloxone administration. An overall risk prediction model incorporating the common clinically available variables above displayed excellent discriminative ability in both the derivation and validation cohorts (c-index=0.820 and 0.814, respectively). CONCLUSION Our cross-validated clinical predictive model accurately estimates the risk of serious opioid-related respiratory depression requiring naloxone administration in postoperative patients.
Collapse
Affiliation(s)
- Sounak Roy
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Stephen Bruehl
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Xiaoke Feng
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Matthew S Shotwell
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Thomas Van De Ven
- Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina, USA
| | - Andrew D Shaw
- Department of Intensive Care and Resuscitation, Cleveland Clinic, Cleveland, Ohio, USA
| | - Miklos D Kertai
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
39
|
Sulieman L, Cronin RM, Carroll RJ, Natarajan K, Marginean K, Mapes B, Roden D, Harris P, Ramirez A. Comparing medical history data derived from electronic health records and survey answers in the All of Us Research Program. J Am Med Inform Assoc 2022; 29:1131-1141. [PMID: 35396991 PMCID: PMC9196700 DOI: 10.1093/jamia/ocac046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 02/18/2022] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE A participant's medical history is important in clinical research and can be captured from electronic health records (EHRs) and self-reported surveys. Both can be incomplete, EHR due to documentation gaps or lack of interoperability and surveys due to recall bias or limited health literacy. This analysis compares medical history collected in the All of Us Research Program through both surveys and EHRs. MATERIALS AND METHODS The All of Us medical history survey includes self-report questionnaire that asks about diagnoses to over 150 medical conditions organized into 12 disease categories. In each category, we identified the 3 most and least frequent self-reported diagnoses and retrieved their analogues from EHRs. We calculated agreement scores and extracted participant demographic characteristics for each comparison set. RESULTS The 4th All of Us dataset release includes data from 314 994 participants; 28.3% of whom completed medical history surveys, and 65.5% of whom had EHR data. Hearing and vision category within the survey had the highest number of responses, but the second lowest positive agreement with the EHR (0.21). The Infectious disease category had the lowest positive agreement (0.12). Cancer conditions had the highest positive agreement (0.45) between the 2 data sources. DISCUSSION AND CONCLUSION Our study quantified the agreement of medical history between 2 sources-EHRs and self-reported surveys. Conditions that are usually undocumented in EHRs had low agreement scores, demonstrating that survey data can supplement EHR data. Disagreement between EHR and survey can help identify possible missing records and guide researchers to adjust for biases.
Collapse
Affiliation(s)
- Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, The Ohio State University, Columbus, Ohio, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Kayla Marginean
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Brandy Mapes
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Dan Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Andrea Ramirez
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Office of data and analytics, All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
40
|
Avery CL, Howard AG, Ballou AF, Buchanan VL, Collins JM, Downie CG, Engel SM, Graff M, Highland HM, Lee MP, Lilly AG, Lu K, Rager JE, Staley BS, North KE, Gordon-Larsen P. Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:55001. [PMID: 35533073 PMCID: PMC9084332 DOI: 10.1289/ehp9098] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 05/11/2023]
Abstract
Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable. Objective: We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied. Discussion: Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations. https://doi.org/10.1289/EHP9098.
Collapse
Affiliation(s)
- Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Anna F Ballou
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason M Collins
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephanie M Engel
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Moa P Lee
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Adam G Lilly
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Sociology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kun Lu
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brooke S Staley
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Penny Gordon-Larsen
- Department of Nutrition, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
41
|
Almowil Z, Zhou SM, Brophy S, Croxall J. Concept Libraries for Repeatable and Reusable Research: Qualitative Study Exploring the Needs of Users. JMIR Hum Factors 2022; 9:e31021. [PMID: 35289755 PMCID: PMC8965669 DOI: 10.2196/31021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 11/17/2021] [Accepted: 12/05/2021] [Indexed: 12/05/2022] Open
Abstract
Background Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective This study aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library). Methods This was a qualitative study using interviews and focus group discussion. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (N=6) to explore their specific needs in the development of a concept library. In addition, a focus group discussion with researchers (N=14) working with the Secured Anonymized Information Linkage databank, a national eHealth data linkage infrastructure, was held to perform a SWOT (strengths, weaknesses, opportunities, and threats) analysis for the phenotyping system and the proposed concept library. The interviews and focus group discussion were transcribed verbatim, and 2 thematic analyses were performed. Results Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and reusing the work of others. The participants suggested some developments that they would like to see to improve reproducible research output using routine data. Conclusions The study indicated that most interviewees valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group discussion revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library.
Collapse
Affiliation(s)
- Zahra Almowil
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| | - Shang-Ming Zhou
- Centre For Health Technology, Faculty of Health, University of Plymouth, Plymouth, United Kingdom
| | - Sinead Brophy
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| | - Jodie Croxall
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| |
Collapse
|
42
|
Seedahmed MI, Mogilnicka I, Zeng S, Luo G, Whooley MA, McCulloch CE, Koth L, Arjomandi M. Performance of a Computational Phenotyping Algorithm for Sarcoidosis Using Diagnostic Codes in Electronic Medical Records: Case Validation Study From 2 Veterans Affairs Medical Centers. JMIR Form Res 2022; 6:e31615. [PMID: 35081036 PMCID: PMC8928044 DOI: 10.2196/31615] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Electronic medical records (EMRs) offer the promise of computationally identifying sarcoidosis cases. However, the accuracy of identifying these cases in the EMR is unknown. OBJECTIVE The aim of this study is to determine the statistical performance of using the International Classification of Diseases (ICD) diagnostic codes to identify patients with sarcoidosis in the EMR. METHODS We used the ICD diagnostic codes to identify sarcoidosis cases by searching the EMRs of the San Francisco and Palo Alto Veterans Affairs medical centers and randomly selecting 200 patients. To improve the diagnostic accuracy of the computational algorithm in cases where histopathological data are unavailable, we developed an index of suspicion to identify cases with a high index of suspicion for sarcoidosis (confirmed and probable) based on clinical and radiographic features alone using the American Thoracic Society practice guideline. Through medical record review, we determined the positive predictive value (PPV) of diagnosing sarcoidosis by two computational methods: using ICD codes alone and using ICD codes plus the high index of suspicion. RESULTS Among the 200 patients, 158 (79%) had a high index of suspicion for sarcoidosis. Of these 158 patients, 142 (89.9%) had documentation of nonnecrotizing granuloma, confirming biopsy-proven sarcoidosis. The PPV of using ICD codes alone was 79% (95% CI 78.6%-80.5%) for identifying sarcoidosis cases and 71% (95% CI 64.7%-77.3%) for identifying histopathologically confirmed sarcoidosis in the EMRs. The inclusion of the generated high index of suspicion to identify confirmed sarcoidosis cases increased the PPV significantly to 100% (95% CI 96.5%-100%). Histopathology documentation alone was 90% sensitive compared with high index of suspicion. CONCLUSIONS ICD codes are reasonable classifiers for identifying sarcoidosis cases within EMRs with a PPV of 79%. Using a computational algorithm to capture index of suspicion data elements could significantly improve the case-identification accuracy.
Collapse
Affiliation(s)
- Mohamed I Seedahmed
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
| | - Izabella Mogilnicka
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Experimental Physiology and Pathophysiology, Laboratory of the Centre for Preclinical Research, Medical University of Warsaw, Warsaw, Poland
| | - Siyang Zeng
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Mary A Whooley
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- Measurement Science Quality Enhancement Research Initiative, San Francisco Veterans Affairs Healthcare System, San Francisco, CA, United States
| | - Charles E McCulloch
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, United States
| | - Laura Koth
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Mehrdad Arjomandi
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
| |
Collapse
|
43
|
Cereceda K, Jorquera R, Villarroel-Espíndola F. Advances in mass cytometry and its applicability to digital pathology in clinical-translational cancer research. ADVANCES IN LABORATORY MEDICINE 2022; 3:5-29. [PMID: 37359436 PMCID: PMC10197474 DOI: 10.1515/almed-2021-0075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 07/16/2021] [Indexed: 06/28/2023]
Abstract
The development and subsequent adaptation of mass cytometry for the histological analysis of tissue sections has allowed the simultaneous spatial characterization of multiple components. This is useful to find the correlation between the genotypic and phenotypic profile of tumor cells and their environment in clinical-translational studies. In this revision, we provide an overview of the most relevant hallmarks in the development, implementation and application of multiplexed imaging in the study of cancer and other conditions. A special focus is placed on studies based on imaging mass cytometry (IMC) and multiplexed ion beam imaging (MIBI). The purpose of this review is to help our readers become familiar with the verification techniques employed on this tool and outline the multiple applications reported in the literature. This review will also provide guidance on the use of IMC or MIBI in any field of biomedical research.
Collapse
Affiliation(s)
- Karina Cereceda
- Laboratorio de Medicina Traslacional, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| | - Roddy Jorquera
- Laboratorio de Medicina Traslacional, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| | - Franz Villarroel-Espíndola
- Laboratorio de Medicina Traslacional, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| |
Collapse
|
44
|
Barajas R, Hair B, Lai G, Rotunno M, Shams-White MM, Gillanders EM, Mechanic LE. Facilitating cancer systems epidemiology research. PLoS One 2022; 16:e0255328. [PMID: 34972102 PMCID: PMC8719747 DOI: 10.1371/journal.pone.0255328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Systems epidemiology offers a more comprehensive and holistic approach to studies of cancer in populations by considering high dimensionality measures from multiple domains, assessing the inter-relationships among risk factors, and considering changes over time. These approaches offer a framework to account for the complexity of cancer and contribute to a broader understanding of the disease. Therefore, NCI sponsored a workshop in February 2019 to facilitate discussion about the opportunities and challenges of the application of systems epidemiology approaches for cancer research. Eight key themes emerged from the discussion: transdisciplinary collaboration and a problem-based approach; methods and modeling considerations; interpretation, validation, and evaluation of models; data needs and opportunities; sharing of data and models; enhanced training practices; dissemination of systems models; and building a systems epidemiology community. This manuscript summarizes these themes, highlights opportunities for cancer systems epidemiology research, outlines ways to foster this research area, and introduces a collection of papers, "Cancer System Epidemiology Insights and Future Opportunities" that highlight findings based on systems epidemiology approaches.
Collapse
Affiliation(s)
- Rolando Barajas
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Brionna Hair
- DCCPS, NCI, NIH, Bethesda, Maryland, United States of America
| | - Gabriel Lai
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Melissa Rotunno
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Marissa M. Shams-White
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Elizabeth M. Gillanders
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Leah E. Mechanic
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
45
|
Greer ML, Davis K, Stack BC. Machine learning can identify patients at risk of hyperparathyroidism without known calcium and intact parathyroid hormone. Head Neck 2021; 44:817-822. [PMID: 34953008 DOI: 10.1002/hed.26970] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/01/2021] [Accepted: 12/16/2021] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND To prove the concept of diagnosing primary hyperparathyroidism (pHPT) without calcium and parathyroid hormone (PTH) values and identifying potential risk factors for pHPT. METHODS Data were extracted from the clinical data warehouse (CDW) at the University of Arkansas for Medical Sciences (UAMS) Epic EHR (2014-2019). RESULTS 1737 patients with over 185 000 rows of clinical data were provided in a relational structure and processed/flattened to facilitate modeling. Phenotype elements were identified for pHPT without advance knowledge of calcium and PTH levels. The area under the curve (AUC) for the prediction of pHPT using our model was 0.86 with sensitivity and specificity of 0.8953 and 0.6686, respectively, using a 0.45 probability threshold. CONCLUSION Primary hyperparathyroidism was predicted from a dataset excluding calcium and PTH data with 86% accuracy. This approach needs to be validated/refined on larger samples of data and plans are in place to do this with other regional/national datasets.
Collapse
Affiliation(s)
- Melody L Greer
- Department of Health Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Kyle Davis
- Department of Otolaryngology - Head and Neck Surgery, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Brendan C Stack
- Department of Otolaryngology - Head and Neck Surgery, Southern Illinois University School of Medicine, Springfield, Illinois, USA
| |
Collapse
|
46
|
Wyatt B, Perumalswami PV, Mageras A, Miller M, Harty A, Ma N, Bowman CA, Collado F, Jeon J, Paulino L, Dinani A, Dieterich D, Li L, Vandromme M, Branch AD. A Digital Case-Finding Algorithm for Diagnosed but Untreated Hepatitis C: A Tool for Increasing Linkage to Treatment and Cure. Hepatology 2021; 74:2974-2987. [PMID: 34333777 PMCID: PMC9299620 DOI: 10.1002/hep.32086] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 06/29/2021] [Accepted: 07/22/2021] [Indexed: 12/20/2022]
Abstract
BACKGROUND AND AIMS Although chronic HCV infection increases mortality, thousands of patients remain diagnosed-but-untreated (DBU). We aimed to (1) develop a DBU phenotyping algorithm, (2) use it to facilitate case finding and linkage to care, and (3) identify barriers to successful treatment. APPROACH AND RESULTS We developed a phenotyping algorithm using Java and SQL and applied it to ~2.5 million EPIC electronic medical records (EMRs; data entered January 2003 to December 2017). Approximately 72,000 EMRs contained an HCV International Classification of Diseases code and/or diagnostic test. The algorithm classified 10,614 cases as DBU (HCV-RNA positive and alive). Its positive and negative predictive values were 88% and 97%, respectively, as determined by manual review of 500 EMRs randomly selected from the ~72,000. Navigators reviewed the charts of 6,187 algorithm-defined DBUs and they attempted to contact potential treatment candidates by phone. By June 2020, 30% (n = 1,862) had completed an HCV-related appointment. Outcomes analysis revealed that DBU patients enrolled in our care coordination program were more likely to complete treatment (72% [n = 219] vs. 54% [n = 256]; P < 0.001) and to have a verified sustained virological response (67% vs. 46%; P < 0.001) than other patients. Forty-eight percent (n = 2,992) of DBU patients could not be reached by phone, which was a major barrier to engagement. Nearly half of these patients had Fibrosis-4 scores ≥ 2.67, indicating significant fibrosis. Multivariable logistic regression showed that DBUs who could not be contacted were less likely to have private insurance than those who could (18% vs. 50%; P < 0.001). CONCLUSIONS The digital DBU case-finding algorithm efficiently identified potential HCV treatment candidates, freeing resources for navigation and coordination. The algorithm is portable and accelerated HCV elimination when incorporated in our comprehensive program.
Collapse
Affiliation(s)
- Brooke Wyatt
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Ponni V. Perumalswami
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY,Division of Gastroenterology and HepatologyUniversity of MichiganAnn ArborMI,Gastroenterology SectionVeterans AffairsAnn Arbor Healthcare SystemAnn ArborMI
| | - Anna Mageras
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Mark Miller
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Alyson Harty
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Ning Ma
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Chip A. Bowman
- Department of MedicineIcahn School of Medicine Mount SinaiNew YorkNY
| | - Francina Collado
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Jihae Jeon
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Lismeiry Paulino
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Amreen Dinani
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Douglas Dieterich
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Li Li
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Maxence Vandromme
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Andrea D. Branch
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| |
Collapse
|
47
|
Daniels H, Jones KH, Heys S, Ford DV. Exploring the Use of Genomic and Routinely Collected Data: Narrative Literature Review and Interview Study. J Med Internet Res 2021; 23:e15739. [PMID: 34559060 PMCID: PMC8501405 DOI: 10.2196/15739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 10/01/2020] [Accepted: 07/15/2021] [Indexed: 11/13/2022] Open
Abstract
Background Advancing the use of genomic data with routinely collected health data holds great promise for health care and research. Increasing the use of these data is a high priority to understand and address the causes of disease. Objective This study aims to provide an outline of the use of genomic data alongside routinely collected data in health research to date. As this field prepares to move forward, it is important to take stock of the current state of play in order to highlight new avenues for development, identify challenges, and ensure that adequate data governance models are in place for safe and socially acceptable progress. Methods We conducted a literature review to draw information from past studies that have used genomic and routinely collected data and conducted interviews with individuals who use these data for health research. We collected data on the following: the rationale of using genomic data in conjunction with routinely collected data, types of genomic and routinely collected data used, data sources, project approvals, governance and access models, and challenges encountered. Results The main purpose of using genomic and routinely collected data was to conduct genome-wide and phenome-wide association studies. Routine data sources included electronic health records, disease and death registries, health insurance systems, and deprivation indices. The types of genomic data included polygenic risk scores, single nucleotide polymorphisms, and measures of genetic activity, and biobanks generally provided these data. Although the literature search showed that biobanks released data to researchers, the case studies revealed a growing tendency for use within a data safe haven. Challenges of working with these data revolved around data collection, data storage, technical, and data privacy issues. Conclusions Using genomic and routinely collected data holds great promise for progressing health research. Several challenges are involved, particularly in terms of privacy. Overcoming these barriers will ensure that the use of these data to progress health research can be exploited to its full potential.
Collapse
Affiliation(s)
- Helen Daniels
- Population Data Science, Swansea University, Swansea, United Kingdom
| | | | - Sharon Heys
- Population Data Science, Swansea University, Swansea, United Kingdom
| | | |
Collapse
|
48
|
Almowil ZA, Zhou SM, Brophy S. Concept libraries for automatic electronic health record based phenotyping: A review. Int J Popul Data Sci 2021; 6:1362. [PMID: 34189274 PMCID: PMC8210840 DOI: 10.23889/ijpds.v5i1.1362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Introduction Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling solution to these problems. However, sharing algorithms is not a common practice and many published studies do not detail the clinical code lists used by the researchers in the disease/characteristic definition. To address these challenges, a number of centres across the world have developed health data portals which contain concept libraries (e.g., algorithms for defining concepts such as disease and characteristics) in order to facilitate disease phenotyping and health studies. Objectives This study aims to review the literature of existing concept libraries, examine their utilities, identify the current gaps, and suggest future developments. Methods The five-stage framework of Arksey and O'Malley was used for the literature search. This approach included defining the research questions, identifying relevant studies through literature review, selecting eligible studies, charting and extracting data, and summarising and reporting the findings. Results This review identified seven publicly accessible Electronic Health data concept libraries which were developed in different countries including UK, USA, and Canada. The concept libraries (n = 7) investigated were either general libraries that hold phenotypes of multiple specialties (n = 4) or specialized libraries that manage only certain specialities such as rare diseases (n = 3). There were some clear differences between the general libraries such as archiving data from different electronic sources, and using a range of different types of coding systems. However, they share some clear similarities such as enabling users to upload their own code lists, and allowing users to use/download the publicly accessible code. In addition, there were some differences between the specialized libraries such as difference in ability to search, and if it was possible to use different searching queries such as simple or complex searches. Conversely, there were some similarities between the specialized libraries such as enabling users to upload their own concepts into the libraries and to show where they were published, which facilitates assessing the validity of the concepts. All the specialized libraries aimed to encourage the reuse of research methods such as lists of clinical code and/or metadata. Conclusion The seven libraries identified have been developed independently and appear to replicate similar concepts but in different ways. Collaboration between similar libraries would greatly facilitate the use of these libraries for the user. The process of building code lists takes time and effort. Access to existing code lists increases consistency and accuracy of definitions across studies. Concept library developers should collaborate with each other to raise awareness of their existence and of their various functions, which could increase users’ contributions to those libraries and promote their wide-ranging adoption.
Collapse
Affiliation(s)
| | - Shang-Ming Zhou
- Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth, PL4 8AA, UK
| | | |
Collapse
|
49
|
Tam CS, Gullick J, Saavedra A, Vernon ST, Figtree GA, Chow CK, Cretikos M, Morris RW, William M, Morris J, Brieger D. Combining structured and unstructured data in EMRs to create clinically-defined EMR-derived cohorts. BMC Med Inform Decis Mak 2021; 21:91. [PMID: 33685456 PMCID: PMC7938556 DOI: 10.1186/s12911-021-01441-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/15/2021] [Indexed: 11/29/2022] Open
Abstract
Background There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs. Methods Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs. Results Among 802,742 encounters in a 5 year dataset (1/1/13–30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4–64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years. Conclusions Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.
Collapse
Affiliation(s)
- Charmaine S Tam
- Centre for Translational Data Science, The University of Sydney, Sydney, Australia. .,Northern Clinical School, The University of Sydney, Sydney, Australia.
| | - Janice Gullick
- Susan Wakil School of Nursing and Midwifery, The University of Sydney, Sydney, Australia
| | - Aldo Saavedra
- Centre for Translational Data Science, The University of Sydney, Sydney, Australia.,Faculty of Health Sciences, The University of Sydney, Sydney, Australia
| | - Stephen T Vernon
- Cardiothoracic and Vascular Health, Kolling Institute of Medical Research and Department of Cardiology, Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, Australia
| | - Gemma A Figtree
- Northern Clinical School, The University of Sydney, Sydney, Australia.,Cardiothoracic and Vascular Health, Kolling Institute of Medical Research and Department of Cardiology, Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, Australia
| | - Clara K Chow
- Westmead Applied Research Centre, The University of Sydney, Sydney, Australia.,Department of Cardiology, Westmead Hospital, Sydney, Australia
| | - Michelle Cretikos
- Centre for Population Health, NSW Ministry of Health, Sydney, Australia
| | - Richard W Morris
- Centre for Translational Data Science, The University of Sydney, Sydney, Australia.,Northern Clinical School, The University of Sydney, Sydney, Australia
| | - Maged William
- Department of Cardiology, Central Coast Local Health District and University of Newcastle, Sydney, Australia
| | - Jonathan Morris
- Northern Clinical School, The University of Sydney, Sydney, Australia.,Clinical and Population Perinatal Health, Northern Sydney Local Health District, Sydney, Australia
| | - David Brieger
- Department of Cardiology, Concord Hospital, Sydney, Australia
| |
Collapse
|
50
|
Walters CE, Nitin R, Margulis K, Boorom O, Gustavson DE, Bush CT, Davis LK, Below JE, Cox NJ, Camarata SM, Gordon RL. Automated Phenotyping Tool for Identifying Developmental Language Disorder Cases in Health Systems Data (APT-DLD): A New Research Algorithm for Deployment in Large-Scale Electronic Health Record Systems. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3019-3035. [PMID: 32791019 PMCID: PMC7890229 DOI: 10.1044/2020_jslhr-19-00397] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/23/2020] [Accepted: 05/19/2020] [Indexed: 05/13/2023]
Abstract
Purpose Data mining algorithms using electronic health records (EHRs) are useful in large-scale population-wide studies to classify etiology and comorbidities (Casey et al., 2016). Here, we apply this approach to developmental language disorder (DLD), a prevalent communication disorder whose risk factors and epidemiology remain largely undiscovered. Method We first created a reliable system for manually identifying DLD in EHRs based on speech-language pathologist (SLP) diagnostic expertise. We then developed and validated an automated algorithmic procedure, called, Automated Phenotyping Tool for identifying DLD cases in health systems data (APT-DLD), that classifies a DLD status for patients within EHRs on the basis of ICD (International Statistical Classification of Diseases and Related Health Problems) codes. APT-DLD was validated in a discovery sample (N = 973) using expert SLP manual phenotype coding as a gold-standard comparison and then applied and further validated in a replication sample of N = 13,652 EHRs. Results In the discovery sample, the APT-DLD algorithm correctly classified 98% (concordance) of DLD cases in concordance with manually coded records in the training set, indicating that APT-DLD successfully mimics a comprehensive chart review. The output of APT-DLD was also validated in relation to independently conducted SLP clinician coding in a subset of records, with a positive predictive value of 95% of cases correctly classified as DLD. We also applied APT-DLD to the replication sample, where it achieved a positive predictive value of 90% in relation to SLP clinician classification of DLD. Conclusions APT-DLD is a reliable, valid, and scalable tool for identifying DLD cohorts in EHRs. This new method has promising public health implications for future large-scale epidemiological investigations of DLD and may inform EHR data mining algorithms for other communication disorders. Supplemental Material https://doi.org/10.23641/asha.12753578.
Collapse
Affiliation(s)
- Courtney E. Walters
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Neuroscience Program, College of Arts and Science, Vanderbilt University, Nashville, TN
| | - Rachana Nitin
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
| | - Katherine Margulis
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
- Kennedy Krieger Institute, Baltimore, MD
| | - Olivia Boorom
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Daniel E. Gustavson
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Catherine T. Bush
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Nancy J. Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Stephen M. Camarata
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Reyna L. Gordon
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|