1
|
Shahab O, El Kurdi B, Shaukat A, Nadkarni G, Soroush A. Large language models: a primer and gastroenterology applications. Therap Adv Gastroenterol 2024; 17:17562848241227031. [PMID: 38390029 PMCID: PMC10883116 DOI: 10.1177/17562848241227031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/02/2024] [Indexed: 02/24/2024] Open
Abstract
Over the past year, the emergence of state-of-the-art large language models (LLMs) in tools like ChatGPT has ushered in a rapid acceleration in artificial intelligence (AI) innovation. These powerful AI models can generate tailored and high-quality text responses to instructions and questions without the need for labor-intensive task-specific training data or complex software engineering. As the technology continues to mature, LLMs hold immense potential for transforming clinical workflows, enhancing patient outcomes, improving medical education, and optimizing medical research. In this review, we provide a practical discussion of LLMs, tailored to gastroenterologists. We highlight the technical foundations of LLMs, emphasizing their key strengths and limitations as well as how to interact with them safely and effectively. We discuss some potential LLM use cases for clinical gastroenterology practice, education, and research. Finally, we review critical barriers to implementation and ongoing work to address these issues. This review aims to equip gastroenterologists with a foundational understanding of LLMs to facilitate a more active clinician role in the development and implementation of this rapidly emerging technology.
Collapse
Affiliation(s)
- Omer Shahab
- Division of Gastroenterology, Department of Medicine, VHC Health, Arlington, VA, USA
| | - Bara El Kurdi
- Division of Gastroenterology and Hepatology, Department of Medicine, Virginia Tech Carilion School of Medicine, Roanoke, VA, USA
| | - Aasma Shaukat
- Division of Gastroenterology, Department of Medicine, NYU Grossman School of Medicine, New York, NY, USA VA
- New York Harbor Veterans Affairs Healthcare System New York City, New York, NY, USA
| | - Girish Nadkarni
- Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ali Soroush
- Division of Data-Driven and Digital Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029-6574, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Henry D. Janowitz Division of Gastroenterology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Guo LL, Morse KE, Aftandilian C, Steinberg E, Fries J, Posada J, Fleming SL, Lemmon J, Jessa K, Shah N, Sung L. Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare. BMC Med Inform Decis Mak 2024; 24:51. [PMID: 38355486 PMCID: PMC10868117 DOI: 10.1186/s12911-024-02449-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 01/30/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels. METHODS This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level. RESULTS The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds. CONCLUSIONS Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.
Collapse
Affiliation(s)
- Lin Lawrence Guo
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada
| | - Keith E Morse
- Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, USA
| | - Catherine Aftandilian
- Division of Hematology/Oncology, Department of Pediatrics, Stanford University, Palo Alto, CA, USA
| | - Ethan Steinberg
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Jason Fries
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Jose Posada
- Universidad del Norte, Barranquilla, Colombia
| | - Scott Lanyon Fleming
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Joshua Lemmon
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada
| | - Karim Jessa
- Information Services, The Hospital for Sick Children, Toronto, ON, Canada
| | - Nigam Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Lillian Sung
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada.
- Division of Haematology/Oncology, The Hospital for Sick Children, 555 University Avenue, M5G1X8, Toronto, ON, Canada.
| |
Collapse
|
3
|
Barr PB, Bigdeli TB, Meyers JL, Peterson RE, Sanchez-Roige S, Mallard TT, Dick DM, Harden KP, Wilkinson A, Graham DP, Nielsen DA, Swann AC, Lipsky RK, Kosten TR, Aslan M, Harvey PD, Kimbrel NA, Beckham JC. Correlates of Risk for Disinhibited Behaviors in the Million Veteran Program Cohort. JAMA Psychiatry 2024; 81:188-197. [PMID: 37938835 PMCID: PMC10633411 DOI: 10.1001/jamapsychiatry.2023.4141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 09/01/2023] [Indexed: 11/10/2023]
Abstract
Importance Many psychiatric outcomes share a common etiologic pathway reflecting behavioral disinhibition, generally referred to as externalizing (EXT) disorders. Recent genome-wide association studies (GWASs) have demonstrated the overlap between EXT disorders and important aspects of veterans' health, such as suicide-related behaviors and substance use disorders (SUDs). Objective To explore correlates of risk for EXT disorders within the Veterans Health Administration (VA) Million Veteran Program (MVP). Design, Setting, and Participants A series of phenome-wide association studies (PheWASs) of polygenic risk scores (PGSs) for EXT disorders was conducted using electronic health records. First, ancestry-specific PheWASs of EXT PGSs were conducted in the African, European, and Hispanic or Latin American ancestries. Next, a conditional PheWAS, covarying for PGSs of comorbid psychiatric problems (depression, schizophrenia, and suicide attempt; European ancestries only), was performed. Lastly, to adjust for unmeasured confounders, a within-family analysis of significant associations from the main PheWAS was performed in full siblings (European ancestries only). This study included the electronic health record data from US veterans from VA health care centers enrolled in MVP. Analyses took place from February 2022 to August 2023 covering a period from October 1999 to January 2020. Exposures PGSs for EXT, depression, schizophrenia, and suicide attempt. Main Outcomes and Measures Phecodes for diagnoses derived from the International Statistical Classification of Diseases, Ninth and Tenth Revisions, Clinical Modification, codes from electronic health records. Results Within the MVP (560 824 patients; mean [SD] age, 67.9 [14.3] years; 512 593 male [91.4%]), the EXT PGS was associated with 619 outcomes, of which 188 were independent of risk for comorbid problems or PGSs (from odds ratio [OR], 1.02; 95% CI, 1.01-1.03 for overweight/obesity to OR, 1.44; 95% CI, 1.42-1.47 for viral hepatitis C). Of the significant outcomes, 73 (11.9%) were significant in the African results and 26 (4.5%) were significant in the Hispanic or Latin American results. Within-family analyses uncovered robust associations between EXT PGS and consequences of SUDs, including liver disease, chronic airway obstruction, and viral hepatitis C. Conclusions and Relevance Results of this cohort study suggest a shared polygenic basis of EXT disorders, independent of risk for other psychiatric problems. In addition, this study found associations between EXT PGS and diagnoses related to SUDs and their sequelae. Overall, this study highlighted the potential negative consequences of EXT disorders for health and functioning in the US veteran population.
Collapse
Affiliation(s)
- Peter B. Barr
- VA New York Harbor Healthcare System, Brooklyn
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, New York
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, New York
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, New York
| | - Tim B. Bigdeli
- VA New York Harbor Healthcare System, Brooklyn
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, New York
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, New York
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, New York
| | - Jacquelyn L. Meyers
- VA New York Harbor Healthcare System, Brooklyn
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, New York
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, New York
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, New York
| | - Roseann E. Peterson
- VA New York Harbor Healthcare System, Brooklyn
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, New York
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, New York
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Travis T. Mallard
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston
- Department of Psychiatry, Harvard Medical School, Boston, Massachusetts
| | - Danielle M. Dick
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey
- Rutgers Addiction Research Center, Rutgers University, Piscataway, New Jersey
| | - K. Paige Harden
- Department of Psychology, University of Texas at Austin, Austin
- Population Research Center, University of Texas at Austin, Austin
| | - Anna Wilkinson
- Michael E. DeBakey VA Medical Center, Houston, Texas
- The University of Texas Health Science Center at Houston, UTHealth Houston School of Public Health, Houston
- Michael and Susan Dell Center for Healthy Living, The University of Texas Health Science Center at Houston, Houston
| | - David P. Graham
- Michael E. DeBakey VA Medical Center, Houston, Texas
- Departments of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, Texas
| | - David A. Nielsen
- Michael E. DeBakey VA Medical Center, Houston, Texas
- Departments of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, Texas
| | - Alan C. Swann
- Michael E. DeBakey VA Medical Center, Houston, Texas
- Departments of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, Texas
| | - Rachele K. Lipsky
- Michael E. DeBakey VA Medical Center, Houston, Texas
- Departments of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, Texas
| | - Thomas R. Kosten
- Michael E. DeBakey VA Medical Center, Houston, Texas
- Departments of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, Texas
| | - Mihaela Aslan
- Clinical Epidemiology Research Center, VA Connecticut Healthcare System, West Haven, Connecticut
- Yale University School of Medicine, New Haven, Connecticut
| | - Philip D. Harvey
- Research Service, Bruce W. Carter Miami Veterans Affairs Medical Center, Miami, Florida
- University of Miami Miller School of Medicine, Miami, Florida
| | - Nathan A. Kimbrel
- Durham VA Health Care System, Durham, North Carolina
- VA Mid-Atlantic Mental Illness Research, Education and Clinical Center, Durham, North Carolina
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, North Carolina
| | - Jean C. Beckham
- Durham VA Health Care System, Durham, North Carolina
- VA Mid-Atlantic Mental Illness Research, Education and Clinical Center, Durham, North Carolina
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, North Carolina
| |
Collapse
|
4
|
Andresen K, Hinojosa-Campos M, Podmore B, Drysdale M, Qizilbash N, Cunnington M. Validity of Routine Health Data To Identify Safety Outcomes of Interest For Covid-19 Vaccines and Therapeutics in the Context of the Emerging Pandemic: A Comprehensive Literature Review. Drug Healthc Patient Saf 2024; 16:1-17. [PMID: 38192299 PMCID: PMC10771726 DOI: 10.2147/dhps.s415292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 08/15/2023] [Indexed: 01/10/2024] Open
Abstract
Introduction Regulatory guidance encourages transparent reporting of information on the quality and validity of electronic health record data being used to generate real-world benefit-risk evidence for vaccines and therapeutics. We aimed to provide an overview of the availability of validated diagnostic algorithms for selected safety endpoints for Coronavirus disease 2019 (COVID-19) vaccines and therapeutics in the context of the emerging pandemic prior to December 2020. Methods We reviewed the literature up to December 2020 to identify validation studies for various safety events of interest, including myocardial infarction, arrhythmia, myocarditis, acute cardiac injury, vasculitis/vasculopathy, venous thromboembolism, stroke, respiratory distress syndrome (RDS), pneumonitis, cytokine release syndrome (CRS), multiple organ dysfunction syndrome, and renal failure. We included studies published between 2015 and 2020 that were considered high quality assessed with QUADAS and that reported positive predictive values (PPVs). Results Out of 43 identified studies, we found that diagnostic algorithms for cardiovascular outcomes were supported by the highest number of validation studies (n=17). Accurate algorithms are available for myocardial infarction (median PPV 80%; IQR 22%), arrhythmia (PPV range >70%), venous thromboembolism (median PPV: 73%) and ischaemic stroke (PPV range ≥85%). We found a lack of validation studies for less common respiratory and cardiac safety outcomes of interest (eg, pneumonitis and myocarditis), as well as for COVID-specific complications (CRS, RDS). Conclusion There is a need for better understanding of barriers to conducting validation studies, including data governance restrictions. Regulatory guidance should promote embedding validation within real-world EHR research used for decision-making.
Collapse
Affiliation(s)
- Kirsty Andresen
- OXON Epidemiology, London, UK
- London School of Hygiene and Tropical Medicine, London, UK
| | | | - Bélène Podmore
- OXON Epidemiology, London, UK
- London School of Hygiene and Tropical Medicine, London, UK
- OXON Epidemiology, Madrid, Spain
| | | | - Nawab Qizilbash
- OXON Epidemiology, London, UK
- London School of Hygiene and Tropical Medicine, London, UK
- OXON Epidemiology, Madrid, Spain
| | | |
Collapse
|
5
|
Guazzo A, Longato E, Fadini GP, Morieri ML, Sparacino G, Di Camillo B. Deep-learning-based natural-language-processing models to identify cardiovascular disease hospitalisations of patients with diabetes from routine visits' text. Sci Rep 2023; 13:19132. [PMID: 37926737 PMCID: PMC10625981 DOI: 10.1038/s41598-023-45115-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/16/2023] [Indexed: 11/07/2023] Open
Abstract
Writing notes is the most widespread method to report clinical events. Therefore, most of the information about the disease history of a patient remains locked behind free-form text. Natural language processing (NLP) provides a solution to automatically transform free-form text into structured data. In the present work, electronic healthcare records data of patients with diabetes were used to develop deep-learning based NLP models to automatically identify, within free-form text describing routine visits, the occurrence of hospitalisations related to cardiovascular disease (CVDs), an outcome of diabetes. Four possible time windows of increasing level of expected difficulty were considered: infinite, 24 months, 12 months, and 6 months. Model performance was evaluated by means of the area under the precision recall curve, as well as precision, recall, and F1-score after thresholding. Results showed that the proposed NLP approach was successful for both the infinite and 24-month windows, while, as expected, performance deteriorated with shorter time windows. Possible clinical applications of tools based on the proposed NLP approach include the retrospective filling of medical records with respect to a patient's CVD history for epidemiological and research purposes as well as for clinical decision making.
Collapse
Affiliation(s)
- Alessandro Guazzo
- Department of Information Engineering, University of Padova, 35131, Padua, Italy
| | - Enrico Longato
- Department of Information Engineering, University of Padova, 35131, Padua, Italy
| | | | | | - Giovanni Sparacino
- Department of Information Engineering, University of Padova, 35131, Padua, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, 35131, Padua, Italy.
- Department of Comparative Biomedicine and Food Science, University of Padova, Legnaro, Italy.
| |
Collapse
|
6
|
Hill EJ, Sharma J, Wissel B, Sawyer RP, Jiang M, Marsili L, Duque K, Botsford V, Wood C, DeLano K, Sun Q, Kissela B, Espay AJ. Parkinson's disease diagnosis codes are insufficiently accurate for electronic health record research and differ by race. Parkinsonism Relat Disord 2023; 114:105764. [PMID: 37517108 DOI: 10.1016/j.parkreldis.2023.105764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 06/21/2023] [Accepted: 07/10/2023] [Indexed: 08/01/2023]
Abstract
BACKGROUND There are no evidence-based guidelines for data cleaning of electronic health record (EHR) databases in Parkinson's disease (PD). Previous filtering criteria have primarily used the 9th International Statistical Classification of Diseases and Related Health Problems (ICD) with variable accuracy for true PD cases. Prior studies have not excluded atypical or drug-induced parkinsonism, and little is known about differences in accuracy by race. OBJECTIVE To determine if excluding parkinsonism diagnoses improves accuracy of ICD-9 and -10 PD diagnosis codes. METHODS We included ≥2 instances of an ICD-9 and/or -10 code for PD. We removed any records with at least one code indicating atypical or drug-induced parkinsonism first in all races, and then in Non-Hispanic White and Black patients. We manually reviewed 100 randomly selected charts per group before and after filtering, and performed a test of proportion (null hypothesis 0.5) for confirmed PD. RESULTS 5633 records had ≥2 instances of a PD code. 2833 remained after filtering. The rate of true PD cases was low before and after filtering to remove parkinsonism codes (0.55 vs. 0.51, p = 0.84). Accuracy was lowest in Black patients before filtering (0.48, p = 0.69), but filtering had a greater (though modest) impact on accuracy (0.68, p < 0.001). CONCLUSIONS There was inadequate accuracy of PD diagnosis codes in the largest study of ICD-9 and -10 codes. Accuracy was lowest in Black patients but improved the most with removing other parkinsonism codes. This highlights the limitations of using current real-world EHR data in PD research and need for further study.
Collapse
Affiliation(s)
- Emily J Hill
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA.
| | - Jennifer Sharma
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Benjamin Wissel
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Russell P Sawyer
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Megan Jiang
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Luca Marsili
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Kevin Duque
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Vanesa Botsford
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Christopher Wood
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Kelly DeLano
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Qin Sun
- Department of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, University of Cincinnati, OH, USA
| | - Brett Kissela
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Alberto J Espay
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
7
|
Wornow M, Xu Y, Thapa R, Patel B, Steinberg E, Fleming S, Pfeffer MA, Fries J, Shah NH. The shaky foundations of large language models and foundation models for electronic health records. NPJ Digit Med 2023; 6:135. [PMID: 37516790 PMCID: PMC10387101 DOI: 10.1038/s41746-023-00879-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/13/2023] [Indexed: 07/31/2023] Open
Abstract
The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. Considering these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.
Collapse
Affiliation(s)
- Michael Wornow
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Yizhe Xu
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Rahul Thapa
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Birju Patel
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Ethan Steinberg
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Scott Fleming
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Michael A Pfeffer
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
- Technology and Digital Services, Stanford Health Care, Palo Alto, CA, USA
| | - Jason Fries
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
- Technology and Digital Services, Stanford Health Care, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
8
|
Barr PB, Bigdeli TB, Meyers JL, Peterson RE, Sanchez-Roige S, Mallard TT, Dick DM, Paige Harden K, Wilkinson A, Graham DP, Nielsen DA, Swann A, Lipsky RK, Kosten T, Aslan M, Harvey PD, Kimbrel NA, Beckham JC. Correlates of Risk for Disinhibited Behaviors in the Million Veteran Program Cohort. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.22.23286865. [PMID: 37034805 PMCID: PMC10081391 DOI: 10.1101/2023.03.22.23286865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Background Many psychiatric outcomes are thought to share a common etiological pathway reflecting behavioral disinhibition, generally referred to as externalizing disorders (EXT). Recent genome-wide association studies (GWAS) have demonstrated the overlap between EXT and important aspects of veterans' health, such as suicide-related behaviors, substance use disorders, and other medical conditions. Methods We conducted a series of phenome-wide association studies (PheWAS) of polygenic scores (PGS) for EXT, and comorbid psychiatric problems (depression, schizophrenia, and suicide attempt) in an ancestrally diverse cohort of U.S. veterans (N = 560,824), using diagnostic codes from electronic health records. We conducted ancestry-specific PheWASs of EXT PGS in the European, African, and Hispanic/Latin American ancestries. To determine if associations were driven by risk for other comorbid problems, we performed a conditional PheWAS, covarying for comorbid psychiatric problems (European ancestries only). Lastly, to adjust for unmeasured confounders we performed a within-family analysis of significant associations from the main PheWAS in full-siblings (N = 12,127, European ancestries only). Results The EXT PGS was associated with 619 outcomes across all bodily systems, of which, 188 were independent of risk for comorbid problems of PGS. Effect sizes ranged from OR = 1.02 (95% CI = 1.01, 1.03) for overweight/obesity to OR = 1.44 (95% CI = 1.42, 1.47) for viral hepatitis C. Of the significant outcomes 73 (11.9%) and 26 (4.5%) were significant in the African and Hispanic/Latin American results, respectively. Within-family analyses uncovered robust associations between EXT and consequences of substance use disorders, including liver disease, chronic airway obstruction, and viral hepatitis C. Conclusion Our results demonstrate a shared polygenic basis of EXT across populations of diverse ancestries and independent of risk for other psychiatric problems. The strongest associations with EXT were for diagnoses related to substance use disorders and their sequelae. Overall, we highlight the potential negative consequences of EXT for health and functioning in the US veteran population.
Collapse
Affiliation(s)
- Peter B. Barr
- VA New York Harbor Healthcare System, Brooklyn, NY
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY
- Institute for Genomics in Health (IGH), SUNY Downstate Health Sciences University, Brooklyn, NY
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, NY
| | - Tim B. Bigdeli
- VA New York Harbor Healthcare System, Brooklyn, NY
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY
- Institute for Genomics in Health (IGH), SUNY Downstate Health Sciences University, Brooklyn, NY
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, NY
| | - Jacquelyn L. Meyers
- VA New York Harbor Healthcare System, Brooklyn, NY
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY
- Institute for Genomics in Health (IGH), SUNY Downstate Health Sciences University, Brooklyn, NY
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, NY
| | - Roseann E. Peterson
- VA New York Harbor Healthcare System, Brooklyn, NY
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY
- Institute for Genomics in Health (IGH), SUNY Downstate Health Sciences University, Brooklyn, NY
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Travis T. Mallard
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Danielle M. Dick
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ
- Rutgers Addiction Research Center, Rutgers University, Piscataway, NJ
| | - K. Paige Harden
- Department of Psychology, University of Texas at Austin, Austin, TX
- Population Research Center, University of Texas at Austin, Austin, TX
| | - Anna Wilkinson
- Michael E. DeBakey VA Medical Center, Houston, TX
- UTHealth Houston School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
- Michael and Susan Dell Center for Healthy Living, The University of Texas Health Science Center at Houston, Houston, TX
| | - David P. Graham
- Michael E. DeBakey VA Medical Center, Houston, TX
- Department of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, TX
| | - David A. Nielsen
- Michael E. DeBakey VA Medical Center, Houston, TX
- Department of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, TX
| | - Alan Swann
- Michael E. DeBakey VA Medical Center, Houston, TX
- Department of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, TX
| | - Rachele K. Lipsky
- Michael E. DeBakey VA Medical Center, Houston, TX
- Department of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, TX
| | - Thomas Kosten
- Michael E. DeBakey VA Medical Center, Houston, TX
- Department of Psychiatry, Neuroscience, Pharmacology, and Immunology and Rheumatology, Baylor College of Medicine, Houston, TX
| | - Mihaela Aslan
- Clinical Epidemiology Research Center (CERC), VA Connecticut Healthcare System, West Haven, CT
- Yale University School of Medicine, New Haven, CT
| | - Philip D. Harvey
- Research Service, Bruce W. Carter Miami Veterans Affairs (VA) Medical Center, Miami, FL
- University of Miami Miller School of Medicine, Miami, FL
| | - Nathan A. Kimbrel
- Durham VA Health Care System, Durham, NC
- VA Mid-Atlantic Mental Illness Research, Education and Clinical Center, Durham, NC
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC
| | - Jean C. Beckham
- Durham VA Health Care System, Durham, NC
- VA Mid-Atlantic Mental Illness Research, Education and Clinical Center, Durham, NC
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC
| |
Collapse
|
9
|
Strasser ZH, Dagliati A, Shakeri Hossein Abad Z, Klann JG, Wagholikar KB, Mesa R, Visweswaran S, Morris M, Luo Y, Henderson DW, Samayamuthu MJ, Omenn GS, Xia Z, Holmes JH, Estiri H, Murphy SN. A retrospective cohort analysis leveraging augmented intelligence to characterize long COVID in the electronic health record: A precision medicine framework. PLOS DIGITAL HEALTH 2023; 2:e0000301. [PMID: 37490472 PMCID: PMC10368277 DOI: 10.1371/journal.pdig.0000301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/16/2023] [Indexed: 07/27/2023]
Abstract
Physical and psychological symptoms lasting months following an acute COVID-19 infection are now recognized as post-acute sequelae of COVID-19 (PASC). Accurate tools for identifying such patients could enhance screening capabilities for the recruitment for clinical trials, improve the reliability of disease estimates, and allow for more accurate downstream cohort analysis. In this retrospective cohort study, we analyzed the EHR of hospitalized COVID-19 patients across three healthcare systems to develop a pipeline for better identifying patients with persistent PASC symptoms (dyspnea, fatigue, or joint pain) after their SARS-CoV-2 infection. We implemented distributed representation learning powered by the Machine Learning for modeling Health Outcomes (MLHO) to identify novel EHR features that could suggest PASC symptoms outside of typical diagnosis codes. MLHO applies an entropy-based feature selection and boosting algorithms for representation mining. These improved definitions were then used for estimating PASC among hospitalized patients. 30,422 hospitalized patients were diagnosed with COVID-19 across three healthcare systems between March 13, 2020 and February 28, 2021. The mean age of the population was 62.3 years (SD, 21.0 years) and 15,124 (49.7%) were female. We implemented the distributed representation learning technique to augment PASC definitions. These definitions were found to have positive predictive values of 0.73, 0.74, and 0.91 for dyspnea, fatigue, and joint pain, respectively. We estimated that 25 percent (CI 95%: 6-48), 11 percent (CI 95%: 6-15), and 13 percent (CI 95%: 8-17) of hospitalized COVID-19 patients will have dyspnea, fatigue, and joint pain, respectively, 3 months or longer after a COVID-19 diagnosis. We present a validated framework for screening and identifying patients with PASC in the EHR and then use the tool to estimate its prevalence among hospitalized COVID-19 patients.
Collapse
Affiliation(s)
- Zachary H. Strasser
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Zahra Shakeri Hossein Abad
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Jeffrey G. Klann
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Kavishwar B. Wagholikar
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Rebecca Mesa
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Darren W. Henderson
- Center for Clinical and Translation Science, University of Kentucky, Lexington, Kentucky, United States of America
| | | | | | - Gilbert S. Omenn
- Dept of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - John H. Holmes
- Department of Biostatistics, Epidemiology, and Informatics; Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Shawn N. Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
10
|
Thomas RD, Kosowan L, Rabey M, Bell A, Connelly KA, Hawkins NM, Casey CG, Singer AG. Validation of a Case Definition to Identify Patients Diagnosed With Cardiovascular Disease in Canadian Primary Care Practices. CJC Open 2023; 5:567-576. [PMID: 37496780 PMCID: PMC10366639 DOI: 10.1016/j.cjco.2023.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 04/17/2023] [Indexed: 07/28/2023] Open
Abstract
Background Cardiovascular disease (CVD) is a leading cause of death globally. This study validates a primary care-based electronic medical record case definition for CVD. Methods This retrospective, cross-sectional study explores electronic medical record data from 1574 primary care providers participating in the Canadian Primary Care Sentinel Surveillance Network. A reference standard was created by reviewing medical records of a subset of patients in this network (n = 2017) for coronary artery disease (CAD), cerebrovascular disease (CeVD), and peripheral vascular disease (PVD). Together, these data produced a CVD reference. We applied validated case definitions to an active patient population (≥ 1 visit between January 1, 2018 and December 31, 2019) to estimate prevalence using the exact binomial test (N = 689,301). Descriptive statistics, χ2 tests, and t tests characterized patients with vs without CVD. Results The optimal CVD Case Definition 2 had a sensitivity of 68.5% (95% Confidence Interval [CI]: 61.6%-74.8%), a specificity of 97.8% (95% CI: 97.0%-98.4%), a positive predictive value of 77.7% (95% CI: 71.6%-82.7%), and a negative predictive value of 96.5% (95% CI: 95.8%-97.1%). Included in this CVD definition was a strong CAD case definition with sensitivity of 91.6% (95% CI: 84.6%-96.1%), specificity of 98.3% (95% CI: 97.6%-98.8%), a PPV of 74.8% (95% CI: 67.8%-80.7%), and an NPV of 99.5% (95% CI: 99.1%-99.7%). This CVD definition also included CeVD and PVD case definitions with low sensitivity (77.6% and 36.6%) but high specificity (98.6% and 99.0%). The estimated prevalence of CVD among primary care patients is 11.2% (95% CI, 11.1%-11.3%; n = 77,064); the majority had CAD (6.4%). Conclusions This study validated a definition of CVD and its component parts-CAD, CeVD, and PVD. Understanding the prevalence and disease burden for patients with CVD within primary care settings can improve prevention and disease management.
Collapse
Affiliation(s)
| | - Leanne Kosowan
- Department of Family Medicine, Rady Faculty of Health Sciences University of Manitoba, Winnipeg, Manitoba, Canada
| | - Mary Rabey
- Faculty of Medicine, University of Limerick, Limerick, Ireland
| | - Alan Bell
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Kim A. Connelly
- Keenan Research Centre for Biomedical Science, Unity Health, St. Michael’s Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Nathaniel M. Hawkins
- Centre for Cardiovascular Innovation, Division of Cardiology, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Alexander G. Singer
- Department of Family Medicine, Rady Faculty of Health Sciences University of Manitoba, Winnipeg, Manitoba, Canada
| |
Collapse
|
11
|
Vaitinadin NS, Stein CM, Mosley JD, Kawai VK. Genetic susceptibility for autoimmune diseases and white blood cell count. Sci Rep 2023; 13:5852. [PMID: 37041293 PMCID: PMC10090175 DOI: 10.1038/s41598-023-32799-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 04/03/2023] [Indexed: 04/13/2023] Open
Abstract
Some autoimmune (AI) conditions affect white blood cell (WBC) counts. Whether a genetic predisposition to AI disease associates with WBC counts in populations expected to have low numbers of AI cases is not known. We developed genetic instruments for 7 AI diseases using genome-wide association study summary statistics. Two-sample inverse variance weighted regression (IVWR) was used to determine associations between each instrument and WBC counts. Effect size represents change in transformed WBC counts per change in log odds-ratio of the disease. For AI diseases with significant associations by IVWR, polygenic risk scores (PRS) were used to test for associations with measured WBC counts in individuals of European ancestry in a community-based (ARIC, n = 8926), and a medical-center derived cohort (BioVU, n = 40,461). The IVWR analyses revealed significant associations between 3 AI diseases and WBC counts: systemic lupus erythematous (Beta = - 0.05 [95% CI, - 0.06, - 0.03]), multiple sclerosis (Beta = - 0.06 [- 0.10, - 0.03]), and rheumatoid arthritis (Beta = 0.02 [0.01, 0.03]). PRS for these diseases showed associations with measured WBC counts in ARIC and BioVU. Effect sizes tended to be larger among females, consistent with the known higher prevalence of these diseases among this group. This study shows that genetic predisposition to systemic lupus erythematosus, rheumatoid arthritis, and multiple sclerosis was associated with WBC counts, even in populations expected to have very low numbers of disease cases.
Collapse
Affiliation(s)
| | - C Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Jonathan D Mosley
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Vivian K Kawai
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
- Division of Clinical Pharmacology, 536 RRB, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.
| |
Collapse
|
12
|
Deutsch AJ, Stalbow L, Majarian TD, Mercader JM, Manning AK, Florez JC, Loos RJ, Udler MS. Polygenic Scores Help Reduce Racial Disparities in Predictive Accuracy of Automated Type 1 Diabetes Classification Algorithms. Diabetes Care 2023; 46:794-800. [PMID: 36745605 PMCID: PMC10090893 DOI: 10.2337/dc22-1833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/10/2023] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Automated algorithms to identify individuals with type 1 diabetes using electronic health records are increasingly used in biomedical research. It is not known whether the accuracy of these algorithms differs by self-reported race. We investigated whether polygenic scores improve identification of individuals with type 1 diabetes. RESEARCH DESIGN AND METHODS We investigated two large hospital-based biobanks (Mass General Brigham [MGB] and BioMe) and identified individuals with type 1 diabetes using an established automated algorithm. We performed medical record reviews to validate the diagnosis of type 1 diabetes. We implemented two published polygenic scores for type 1 diabetes (developed in individuals of European or African ancestry). We assessed the classification algorithm before and after incorporating polygenic scores. RESULTS The automated algorithm was more likely to incorrectly assign a diagnosis of type 1 diabetes in self-reported non-White individuals than in self-reported White individuals (odds ratio 3.45; 95% CI 1.54-7.69; P = 0.0026). After incorporating polygenic scores into the MGB Biobank, the positive predictive value of the type 1 diabetes algorithm increased from 70 to 97% for self-reported White individuals (meaning that 97% of those predicted to have type 1 diabetes indeed had type 1 diabetes) and from 53 to 100% for self-reported non-White individuals. Similar results were found in BioMe. CONCLUSIONS Automated phenotyping algorithms may exacerbate health disparities because of an increased risk of misclassification of individuals from underrepresented populations. Polygenic scores may be used to improve the performance of phenotyping algorithms and potentially reduce this disparity.
Collapse
Affiliation(s)
- Aaron J. Deutsch
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Lauren Stalbow
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Timothy D. Majarian
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Josep M. Mercader
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Alisa K. Manning
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA
| | - Jose C. Florez
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Ruth J.F. Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Miriam S. Udler
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| |
Collapse
|
13
|
Fashina TA, Miller CM, Paintsil E, Niccolai LM, Brandt C, Oliveira CR. Computable Clinical Phenotyping of Postacute Sequelae of COVID-19 in Pediatrics Using Real-World Data. J Pediatric Infect Dis Soc 2023; 12:113-116. [PMID: 36548966 PMCID: PMC9969330 DOI: 10.1093/jpids/piac132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
Many questions remain unanswered concerning the long-term effects of COVID-19 on children. In this report, we describe a computable phenotyping algorithm for identifying children and adolescents with postacute sequelae of COVID-19 (PASC) and pilot this tool to characterize the clinical epidemiology of pediatric PASC in a large healthcare delivery network.
Collapse
Affiliation(s)
| | - Christine M Miller
- Department of Pediatrics, Section of Infectious Diseases and Global Health, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Elijah Paintsil
- Department of Pediatrics, Section of Infectious Diseases and Global Health, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Pharmacology, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Epidemiology of Microbial Diseases, Yale University School of Public, New Haven, Connecticut, USA
| | - Linda M Niccolai
- Department of Epidemiology of Microbial Diseases, Yale University School of Public, New Haven, Connecticut, USA
| | - Cynthia Brandt
- Department of Biostatistics, Section of Health Informatics, Yale University School of Public Health, New Haven, Connecticut, USA, USA
| | - Carlos R Oliveira
- Department of Pediatrics, Section of Infectious Diseases and Global Health, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Biostatistics, Section of Health Informatics, Yale University School of Public Health, New Haven, Connecticut, USA, USA
| |
Collapse
|
14
|
Rivera AS, Al-Heeti O, Petito LC, Feinstein MJ, Achenbach CJ, Williams J, Taiwo B. Association of statin use with outcomes of patients admitted with COVID-19: an analysis of electronic health records using superlearner. BMC Infect Dis 2023; 23:115. [PMID: 36829115 PMCID: PMC9951166 DOI: 10.1186/s12879-023-08026-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 01/23/2023] [Indexed: 02/26/2023] Open
Abstract
IMPORTANCE Statin use prior to hospitalization for Coronavirus Disease 2019 (COVID-19) is hypothesized to improve inpatient outcomes including mortality, but prior findings from large observational studies have been inconsistent, due in part to confounding. Recent advances in statistics, including incorporation of machine learning techniques into augmented inverse probability weighting with targeted maximum likelihood estimation, address baseline covariate imbalance while maximizing statistical efficiency. OBJECTIVE To estimate the association of antecedent statin use with progression to severe inpatient outcomes among patients admitted for COVD-19. DESIGN, SETTING AND PARTICIPANTS We retrospectively analyzed electronic health records (EHR) from individuals ≥ 40-years-old who were admitted between March 2020 and September 2022 for ≥ 24 h and tested positive for SARS-CoV-2 infection in the 30 days before to 7 days after admission. EXPOSURE Antecedent statin use-statin prescription ≥ 30 days prior to COVID-19 admission. MAIN OUTCOME Composite end point of in-hospital death, intubation, and intensive care unit (ICU) admission. RESULTS Of 15,524 eligible COVID-19 patients, 4412 (20%) were antecedent statin users. Compared with non-users, statin users were older (72.9 (SD: 12.6) versus 65.6 (SD: 14.5) years) and more likely to be male (54% vs. 51%), White (76% vs. 71%), and have ≥ 1 medical comorbidity (99% vs. 86%). Unadjusted analysis demonstrated that a lower proportion of antecedent users experienced the composite outcome (14.8% vs 19.3%), ICU admission (13.9% vs 18.3%), intubation (5.1% vs 8.3%) and inpatient deaths (4.4% vs 5.2%) compared with non-users. Risk differences adjusted for labs and demographics were estimated using augmented inverse probability weighting with targeted maximum likelihood estimation using Super Learner. Statin users still had lower rates of the composite outcome (adjusted risk difference: - 3.4%; 95% CI: - 4.6% to - 2.1%), ICU admissions (- 3.3%; - 4.5% to - 2.1%), and intubation (- 1.9%; - 2.8% to - 1.0%) but comparable inpatient deaths (0.6%; - 1.3% to 0.1%). CONCLUSIONS AND RELEVANCE After controlling for confounding using doubly robust methods, antecedent statin use was associated with minimally lower risk of severe COVID-19-related outcomes, ICU admission and intubation, however, we were not able to corroborate a statin-associated mortality benefit.
Collapse
Affiliation(s)
- Adovich S Rivera
- Institute for Public Health and Management, Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, 91101, USA
| | - Omar Al-Heeti
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, 645 N. Michigan Ave, Suite 900, Chicago, IL, 60611, USA
| | - Lucia C Petito
- Division of Biostatistics, Department of Preventive Medicine, Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Mathew J Feinstein
- Division of Cardiology, Department of Medicine, Feinberg School of Medicine, Chicago, IL, 60611, USA
- Division of Epidemiology, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Chad J Achenbach
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, 645 N. Michigan Ave, Suite 900, Chicago, IL, 60611, USA
- Havey Institute for Global Health, Northwestern University Feinberg School of Medicine, Chicago, IL, 606011, USA
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Janna Williams
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, 645 N. Michigan Ave, Suite 900, Chicago, IL, 60611, USA
| | - Babafemi Taiwo
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, 645 N. Michigan Ave, Suite 900, Chicago, IL, 60611, USA.
- Havey Institute for Global Health, Northwestern University Feinberg School of Medicine, Chicago, IL, 606011, USA.
| |
Collapse
|
15
|
Wan NC, Yaqoob AA, Ong HH, Zhao J, Wei WQ. Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping. J Am Med Inform Assoc 2023; 30:456-465. [PMID: 36451277 PMCID: PMC9933070 DOI: 10.1093/jamia/ocac234] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/28/2022] [Accepted: 11/23/2022] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP. MATERIALS AND METHODS We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement. RESULTS Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration. CONCLUSIONS Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.
Collapse
Affiliation(s)
- Nicholas C Wan
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, USA
| | - Ali A Yaqoob
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
16
|
Landstrom AP, Yang Q, Sun B, Perelli RM, Bidzimou MT, Zhang Z, Aguilar-Sanchez Y, Alsina KM, Cao S, Reynolds JO, Word TA, van der Sangen NM, Wells Q, Kannankeril PJ, Ludwig A, Kim JJ, Wehrens XH. Reduction in Junctophilin 2 Expression in Cardiac Nodal Tissue Results in Intracellular Calcium-Driven Increase in Nodal Cell Automaticity. Circ Arrhythm Electrophysiol 2023; 16:e010858. [PMID: 36706317 PMCID: PMC9974897 DOI: 10.1161/circep.122.010858] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 01/06/2023] [Indexed: 01/29/2023]
Abstract
BACKGROUND Spontaneously depolarizing nodal cells comprise the pacemaker of the heart. Intracellular calcium (Ca2+) plays a critical role in mediating nodal cell automaticity and understanding this so-called Ca2+ clock is critical to understanding nodal arrhythmias. We previously demonstrated a role for Jph2 (junctophilin 2) in regulating Ca2+-signaling through inhibition of RyR2 (ryanodine receptor 2) Ca2+ leak in cardiac myocytes; however, its role in pacemaker function and nodal arrhythmias remains unknown. We sought to determine whether nodal Jph2 expression silencing causes increased sinoatrial and atrioventricular nodal cell automaticity due to aberrant RyR2 Ca2+ leak. METHODS A tamoxifen-inducible, nodal tissue-specific, knockdown mouse of Jph2 was achieved using a Cre-recombinase-triggered short RNA hairpin directed against Jph2 (Hcn4:shJph2). In vivo cardiac rhythm was monitored by surface ECG, implantable cardiac telemetry, and intracardiac electrophysiology studies. Intracellular Ca2+ imaging was performed using confocal-based line scans of isolated nodal cells loaded with fluorescent Ca2+ reporter Cal-520. Whole cell patch clamp was conducted on isolated nodal cells to determine action potential kinetics and sodium-calcium exchanger function. RESULTS Hcn4:shJph2 mice demonstrated a 40% reduction in nodal Jph2 expression, resting sinus tachycardia, and impaired heart rate response to pharmacologic stress. In vivo intracardiac electrophysiology studies and ex vivo optical mapping demonstrated accelerated junctional rhythm originating from the atrioventricular node. Hcn4:shJph2 nodal cells demonstrated increased and irregular Ca2+ transient generation with increased Ca2+ spark frequency and Ca2+ leak from the sarcoplasmic reticulum. This was associated with increased nodal cell AP firing rate, faster diastolic repolarization rate, and reduced sodium-calcium exchanger activity during repolarized states compared to control. Phenome-wide association studies of the JPH2 locus identified an association with sinoatrial nodal disease and atrioventricular nodal block. CONCLUSIONS Nodal-specific Jph2 knockdown causes increased nodal automaticity through increased Ca2+ leak from intracellular stores. Dysregulated intracellular Ca2+ underlies nodal arrhythmogenesis in this mouse model.
Collapse
Affiliation(s)
- Andrew P. Landstrom
- Dept of Pediatrics, Division of Cardiology, Duke Univ School of Medicine, Durham, NC
- Dept of Cell Biology, Duke Univ School of Medicine, Durham, NC
| | - Qixin Yang
- Dept of Pediatrics, Division of Cardiology, Duke Univ School of Medicine, Durham, NC
- Dept of Cardiology, The First Affiliated Hospital, College of Medicine, Zhejiang Univ, Hangzhou, China
| | - Bo Sun
- Dept of Pediatrics, Division of Cardiology, Duke Univ School of Medicine, Durham, NC
| | | | | | - Zhushan Zhang
- Dept of Cell Biology, Duke Univ School of Medicine, Durham, NC
| | - Yuriana Aguilar-Sanchez
- Integrative Molecular & Biomedical Sciences Program, Baylor College of Medicine, Houston, TX
| | - Katherina M. Alsina
- Integrative Molecular & Biomedical Sciences Program, Baylor College of Medicine, Houston, TX
| | - Shuyi Cao
- Dept of Molecular Physiology & Biophysics, Baylor College of Medicine, Houston, TX
| | - Julia O. Reynolds
- Dept of Molecular Physiology & Biophysics, Baylor College of Medicine, Houston, TX
| | - Tarah A. Word
- Dept of Molecular Physiology & Biophysics, Baylor College of Medicine, Houston, TX
| | | | - Quinn Wells
- Depts of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt Univ School of Medicine, Nashville, TN
| | - Prince J. Kannankeril
- Center for Pediatric Precision Medicine, Dept of Pediatrics, Vanderbilt Univ School of Medicine, Nashville, TN
| | - Andreas Ludwig
- Institut für Experimentelle und Klinische Pharmakologie und Toxikologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Jeffrey J. Kim
- Dept of Pediatrics, Section of Cardiology, Baylor College of Medicine, Houston, TX
| | - Xander H.T. Wehrens
- Dept of Molecular Physiology & Biophysics, Baylor College of Medicine, Houston, TX
- Dept of Pediatrics, Section of Cardiology, Baylor College of Medicine, Houston, TX
- Depts of Neuroscience & Center for Space Medicine and the Cardiovascular Research Institute, Baylor College of Medicine, Houston, TX
| |
Collapse
|
17
|
Tedeschi SK, Yoshida K, Huang W, Solomon DH. Confirming Prior and Identifying Novel Correlates of Acute Calcium Pyrophosphate Crystal Arthritis. Arthritis Care Res (Hoboken) 2023; 75:283-288. [PMID: 34397174 PMCID: PMC8847549 DOI: 10.1002/acr.24770] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 06/30/2021] [Accepted: 08/12/2021] [Indexed: 12/29/2022]
Abstract
OBJECTIVE To investigate previously identified and novel correlates of acute calcium pyrophosphate (CPP) crystal arthritis among well-characterized cases. METHODS In this case-control study, we identified cases of acute CPP crystal arthritis using a validated algorithm (positive predictive value 81%) applied in the Partners HealthCare electronic health record (EHR). Cases were matched to general patient controls on the year of first EHR encounter and index date. Prespecified potential correlates included sex, race, and comorbidities and medications previously associated with CPP deposition/acute CPP crystal arthritis in the literature. We estimated odds ratios (ORs) and 95% confidence intervals using conditional logistic regression models adjusted for demographic characteristics, comorbidities, medications prescribed in the past 90 days, health care utilization, and multimorbidity score. RESULTS We identified 1,697 cases matched to 6,503 controls. Mean ± SD age was 73.7 ± 11.8 years, 56.7% were female, 80.8% were White, and 10.3% were Black. All prespecified covariates were more common in cases than controls. Osteoarthritis (OR 3.08), male sex (OR 1.35), rheumatoid arthritis (OR 2.09), gout (OR 2.83), proton pump inhibitors (OR 1.94), loop diuretics (OR 1.60), and thiazides (OR 1.46) were significantly associated with acute CPP crystal arthritis after full adjustment. Black race was associated with lower odds for acute CPP crystal arthritis compared to White race (OR 0.47). CONCLUSION Using a validated algorithm to identify nearly 1,700 patients with acute CPP crystal arthritis, we confirmed important correlates of this acute manifestation of CPP deposition. This is the first study to report higher odds for acute CPP crystal arthritis among males.
Collapse
Affiliation(s)
- Sara K. Tedeschi
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Kazuki Yoshida
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Weixing Huang
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
| | - Daniel H. Solomon
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
18
|
A validated artificial intelligence-based pipeline for population-wide primary immunodeficiency screening. J Allergy Clin Immunol 2023; 151:272-279. [PMID: 36243223 DOI: 10.1016/j.jaci.2022.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 09/21/2022] [Accepted: 10/05/2022] [Indexed: 11/07/2022]
Abstract
BACKGROUND Identification of patients with underlying inborn errors of immunity and inherent susceptibility to infection remains challenging. The ensuing protracted diagnostic odyssey for such patients often results in greater morbidity and suboptimal outcomes, underscoring a need to develop systematic methods for improving diagnostic rates. OBJECTIVE The principal aim of this study is to build and validate a generalizable analytical pipeline for population-wide detection of infection susceptibility and risk of primary immunodeficiency. METHODS This prospective, longitudinal cohort study coupled weighted rules with a machine learning classifier for risk stratification. Claims data were analyzed from a diverse population (n = 427,110) iteratively over 30 months. Cohort outcomes were enumerated for new diagnoses, hospitalizations, and acute care visits. This study followed TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) standards. RESULTS Cohort members initially identified as high risk were proportionally more likely to receive a diagnosis of primary immunodeficiency compared to those at low-medium risk or those without claims of interest respectively (9% vs 1.5% vs 0.2%; P < .001, chi-square test). Subsequent machine learning stratification enabled an annualized individual snapshot of complexity for triaging referrals. This study's top-performing machine learning model for visit-level prediction used a single dense layer neural network architecture (area under the receiver-operator characteristic curve = 0.98; F1 score = 0.98). CONCLUSIONS A 2-step analytical pipeline can facilitate identification of individuals with primary immunodeficiency and accurately quantify clinical risk.
Collapse
|
19
|
Wu AD, Wilson AM. Parkinson's disease population-wide registries in the United States: Current and future opportunities. Front Digit Health 2023; 5:1149154. [PMID: 37035478 PMCID: PMC10073707 DOI: 10.3389/fdgth.2023.1149154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 03/01/2023] [Indexed: 04/11/2023] Open
Abstract
Parkinson's disease (PD) is a neurodegenerative disease with both genetic and environmental risk factors. Efforts to understand the growing incidence and prevalence of PD have led to several state PD registry initiatives in the United States. The California PD Registry (CPDR) is the largest state-wide PD registry and requires electronic reporting of all eligible cases by all medical providers. We borrow from our experience with the CPDR to highlight 4 gaps to population-based PD registries. Specifically we address (1) who should be included in PD registries; (2) what data should be collected in PD case reports; (3) how to ensure the validity of case reports; and (4) how can state PD registries exchange and aggregate information. We propose a set of recommendations that addresses these and other gaps toward achieving a promise of a practical, interoperable, and scalable PD registry in the U.S., which can serve as a key health information resource to support epidemiology, health equity, quality improvement, and research.
Collapse
Affiliation(s)
- Allan D. Wu
- Division of Movement Disorders, Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
- Department of Neurology, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States
- Stanley Manne Children’s Research Institute, Ann & Robert H. Lurie Children’s Hospital, Chicago, IL, United States
- Correspondence: Allan D. Wu
| | - Andrew M. Wilson
- Department of Neurology, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States
- Department of Neurology, Greater Los Angeles VA, Los Angeles, CA, UnitedStates
| |
Collapse
|
20
|
Charpignon ML, Vakulenko-Lagun B, Zheng B, Magdamo C, Su B, Evans K, Rodriguez S, Sokolov A, Boswell S, Sheu YH, Somai M, Middleton L, Hyman BT, Betensky RA, Finkelstein SN, Welsch RE, Tzoulaki I, Blacker D, Das S, Albers MW. Causal inference in medical records and complementary systems pharmacology for metformin drug repurposing towards dementia. Nat Commun 2022; 13:7652. [PMID: 36496454 PMCID: PMC9741618 DOI: 10.1038/s41467-022-35157-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 11/21/2022] [Indexed: 12/13/2022] Open
Abstract
Metformin, a diabetes drug with anti-aging cellular responses, has complex actions that may alter dementia onset. Mixed results are emerging from prior observational studies. To address this complexity, we deploy a causal inference approach accounting for the competing risk of death in emulated clinical trials using two distinct electronic health record systems. In intention-to-treat analyses, metformin use associates with lower hazard of all-cause mortality and lower cause-specific hazard of dementia onset, after accounting for prolonged survival, relative to sulfonylureas. In parallel systems pharmacology studies, the expression of two AD-related proteins, APOE and SPP1, was suppressed by pharmacologic concentrations of metformin in differentiated human neural cells, relative to a sulfonylurea. Together, our findings suggest that metformin might reduce the risk of dementia in diabetes patients through mechanisms beyond glycemic control, and that SPP1 is a candidate biomarker for metformin's action in the brain.
Collapse
Affiliation(s)
- Marie-Laure Charpignon
- Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Bang Zheng
- Ageing Epidemiology Research Unit, School of Public Health, Imperial College London, London, UK
| | - Colin Magdamo
- Department of Neurology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA
| | - Bowen Su
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
| | - Kyle Evans
- Department of Neurology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA
- Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA, USA
| | - Steve Rodriguez
- Department of Neurology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA
- Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA, USA
| | - Artem Sokolov
- Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA, USA
| | - Sarah Boswell
- Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA, USA
| | - Yi-Han Sheu
- Department of Psychiatry, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA
| | - Melek Somai
- Inception Labs, Collaborative for Health Delivery Sciences, Medical College of Wisconsin, Wauwatosa, WI, USA
| | - Lefkos Middleton
- Ageing Epidemiology Research Unit, School of Public Health, Imperial College London, London, UK
- Public Health Directorate, Imperial College London NHS Healthcare Trust, London, UK
| | - Bradley T Hyman
- Department of Neurology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA
| | - Rebecca A Betensky
- Department of Biostatistics, School of Global Public Health, New York University, New York, NY, USA
| | - Stan N Finkelstein
- Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA
- Division of Clinical Informatics, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Roy E Welsch
- Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA
- Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ioanna Tzoulaki
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK.
- Dementia Research Institute, Imperial College London, London, UK.
- Department of Hygiene and Epidemiology, University of Ioannina, Ioannina, Greece.
| | - Deborah Blacker
- Department of Psychiatry, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA.
| | - Mark W Albers
- Department of Neurology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, USA.
- Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
21
|
Nagamine T, Gillette B, Kahoun J, Burghaus R, Lippert J, Saxena M. Data-driven identification of heart failure disease states and progression pathways using electronic health records. Sci Rep 2022; 12:17871. [PMID: 36284167 PMCID: PMC9596465 DOI: 10.1038/s41598-022-22398-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 10/13/2022] [Indexed: 01/20/2023] Open
Abstract
Heart failure (HF) is a leading cause of morbidity, healthcare costs, and mortality. Guideline based segmentation of HF into distinct subtypes is coarse and unlikely to reflect the heterogeneity of etiologies and disease trajectories of patients. While analyses of electronic health records show promise in expanding our understanding of complex syndromes like HF in an evidence-driven way, limitations in data quality have presented challenges for large-scale EHR-based insight generation and decision-making. We present a hypothesis-free approach to generating real-world characteristics and progression patterns of HF. Patient disease state snapshots are extracted from the complaints mentioned in unstructured clinical notes. Typical disease states are generated by clustering and characterized in terms of their distinguishing features, temporal relationships, and risk of important clinical events. Our analysis generates a comprehensive "disease phenome" of real-world patients computed from large, noisy, secondary-use EHR datasets created in a routine clinical setting.
Collapse
Affiliation(s)
| | - Brian Gillette
- grid.137628.90000 0004 1936 8753Department of Surgery, NYU Langone Long Island, Mineola, NY USA ,Department of Foundations of Medicine, NYU Long Island School of Medicine, Mineola, NY USA
| | - John Kahoun
- Droice Research, New York, NY USA ,Clinical Informatics, CityMD, New York, NY USA
| | - Rolf Burghaus
- grid.420044.60000 0004 0374 4101Bayer AG, Wuppertal, Germany
| | - Jörg Lippert
- grid.420044.60000 0004 0374 4101Bayer AG, Wuppertal, Germany
| | | |
Collapse
|
22
|
Schliep KC, Ju S, Foster NL, Smith KR, Varner MM, Østbye T, Tschanz J. How good are medical and death records for identifying dementia? Alzheimers Dement 2022; 18:1812-1823. [PMID: 34873816 PMCID: PMC9170837 DOI: 10.1002/alz.12526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/06/2021] [Accepted: 10/12/2021] [Indexed: 01/28/2023]
Abstract
INTRODUCTION Retrospective studies using administrative data may be an efficient way to assess risk factors for dementia if diagnostic accuracy is known. METHODS Within-individual clinical diagnoses of Alzheimer's disease (AD) and all-cause dementia in ambulatory (outpatient) surgery, inpatient, Medicare administrative records and death certificates were compared with research diagnoses among participants of Cache County Study on Memory, Health, and Aging (CCSMHA) (1995-2008, N = 5092). RESULTS Combining all sources of clinical health data increased sensitivity for identifying all-cause dementia (71%) and AD (48%), while maintaining relatively high specificity (81% and 93%, respectively). Medicare claims had the highest sensitivity for case identification (57% and 40%, respectively). DISCUSSION Administrative health data may provide a less accurate method than a research evaluation for identifying individuals with dementing disease, but accuracy is improved by combining health data sources. Assessing all-cause dementia versus a specific cause of dementia such as AD will result in increased sensitivity, but at a cost to specificity.
Collapse
Affiliation(s)
- Karen C. Schliep
- Division of Public Health, Department of Family and Preventive Medicine, University of Utah, Salt Lake City, Utah, USA
| | - Shinyoung Ju
- Division of Public Health, Department of Family and Preventive Medicine, University of Utah, Salt Lake City, Utah, USA
| | - Norman L. Foster
- Center for Alzheimer’s Care, Imaging & Research, Department of Neurology, University of Utah, Salt Lake City, Utah, USA
| | - Ken R. Smith
- Department of Family and Consumer Studies and Population Sciences/Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
| | - Michael M. Varner
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, University of Utah, Salt Lake City, Utah, USA
| | - Truls Østbye
- Department of Family Medicine and Community Health, Duke University Medical Center, Durham, North Carolina, USA
| | - JoAnn Tschanz
- Department of Psychology, Utah State University, Logan, Utah, USA
| |
Collapse
|
23
|
Jiang L, Kerchberger VE, Shaffer C, Dickson AL, Ormseth MJ, Daniel LL, Leon BGC, Cox NJ, Chung CP, Wei WQ, Stein CM, Feng Q. Genome-wide association analyses of common infections in a large practice-based biobank. BMC Genomics 2022; 23:672. [PMID: 36167494 PMCID: PMC9512962 DOI: 10.1186/s12864-022-08888-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/26/2022] [Indexed: 12/15/2022] Open
Abstract
INTRODUCTION Infectious diseases are common causes of morbidity and mortality worldwide. Susceptibility to infection is highly heritable; however, little has been done to identify the genetic determinants underlying common infectious diseases. One GWAS was performed using 23andMe information about self-reported infections; we set out to confirm previous loci and identify new ones using medically diagnosed infections. METHODS We used the electronic health record (EHR)-based biobank at Vanderbilt and diagnosis codes to identify cases of 12 infectious diseases in white patients: urinary tract infection, pneumonia, chronic sinus infections, otitis media, candidiasis, streptococcal pharyngitis, herpes zoster, herpes labialis, hepatitis B, infectious mononucleosis, tuberculosis (TB) or a positive TB test, and hepatitis C. We selected controls from patients with no diagnosis code for the candidate disease and matched by year of birth, sex, and calendar year at first and last EHR visits. We conducted GWAS using SAIGE and transcriptome-wide analysis (TWAS) using S-PrediXcan. We also conducted phenome-wide association study to understand associations between identified genetic variants and clinical phenotypes. RESULTS We replicated three 23andMe loci (p ≤ 0.05): herpes zoster and rs7047299-A (p = 2.6 × 10-3) and rs2808290-C (p = 9.6 × 10-3;); otitis media and rs114947103-C (p = 0.04). We also identified 2 novel regions (p ≤ 5 × 10-8): rs113235453-G for otitis media (p = 3.04 × 10-8), and rs10422015-T for candidiasis (p = 3.11 × 10-8). In TWAS, four gene-disease associations were significant: SLC30A9 for otitis media (p = 8.06 × 10-7); LRP3 and WDR88 for candidiasis (p = 3.91 × 10-7 and p = 1.95 × 10-6); and AAMDC for hepatitis B (p = 1.51 × 10-6). CONCLUSION We conducted GWAS and TWAS for 12 infectious diseases and identified novel genetic contributors to the susceptibility of infectious diseases.
Collapse
Affiliation(s)
- Lan Jiang
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - V Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Christian Shaffer
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Alyson L Dickson
- Division of Rheumatology and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Michelle J Ormseth
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Division of Rheumatology and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Division of Research and Development, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Laura L Daniel
- Division of Rheumatology and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Barbara G Carranza Leon
- Division of Diabetes, Endocrinology and Metabolism, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J Cox
- Department of Medicine, Vanderbilt Genetic Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Cecilia P Chung
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Division of Rheumatology and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Medicine, Vanderbilt Genetic Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - C Michael Stein
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - QiPing Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. .,Department of Medicine, Vanderbilt Genetic Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
24
|
Culié D, Schiappa R, Contu S, Scheller B, Villarme A, Dassonville O, Poissonnet G, Bozec A, Chamorey E. Validation and Improvement of a Convolutional Neural Network to Predict the Involved Pathology in a Head and Neck Surgery Cohort. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:12200. [PMID: 36231500 PMCID: PMC9564535 DOI: 10.3390/ijerph191912200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 06/16/2023]
Abstract
The selection of patients for the constitution of a cohort is a major issue for clinical research (prospective studies and retrospective studies in real life). Our objective was to validate in real life conditions the use of a Deep Learning process based on a neural network, for the classification of patients according to the pathology involved in a head and neck surgery department. 24,434 Electronic Health Records (EHR) from the first visit between 2000 and 2020 were extracted. More than 6000 EHR were manually classified in ten groups of interest according to the reason for consultation with a clinical relevance. A convolutional neural network (TensorFlow, previously reported by Hsu et al.) was then used to predict the group of patients based on their pathology, using two levels of classification based on clinically relevant criteria. On the first and second level of classification, macro-average performances were: 0.95, 0.83, 0.85, 0.97, 0.84 and 0.93, 0.76, 0.83, 0.96, 0.79 for accuracy, recall, precision, specificity and F1-score versus accuracy, recall and precision of 0.580, 580 and 0.582 for Hsu et al., respectively. We validated this model to predict the pathology involved and to constitute clinically relevant cohorts in a tertiary hospital. This model did not require a preprocessing stage, was used in French and showed equivalent or better performances than other already published techniques.
Collapse
Affiliation(s)
- Dorian Culié
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Renaud Schiappa
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Sara Contu
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Boris Scheller
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Agathe Villarme
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Olivier Dassonville
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Gilles Poissonnet
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Alexandre Bozec
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Emmanuel Chamorey
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| |
Collapse
|
25
|
Shinohara E, Shibata D, Kawazoe Y. Development of comprehensive annotation criteria for patients' states from clinical texts. J Biomed Inform 2022; 134:104200. [PMID: 36089198 DOI: 10.1016/j.jbi.2022.104200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/17/2022] [Accepted: 09/04/2022] [Indexed: 11/18/2022]
Abstract
In clinical records, much of the clinical information is recorded as free text, thus necessitating the use of advanced automatic information extraction technology. The development of practical technologies requires a corpus with finer granularity annotations that describe the information in the corpus, but such annotation criteria have not been researched enough thus far. This study aimed to develop fine grained annotation criteria that exhaustively cover patients' states in case reports. We collected 362 case reports-written in Japanese-of intractable diseases that were expected to contain a broad range of patients' states. Criteria were developed by repeatedly revising and annotating the clinical case report text. A set of annotation criteria for patients' states, consisting of 46 entity types, 9 attributes, and 36 relations, was obtained it allows more detailed information to be expressed than in previous studies by broader range of concept types including treatment, and captures clinical information based on a combination of causality and judgment, which could not be expressed before.
Collapse
Affiliation(s)
- Emiko Shinohara
- Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
| | - Daisaku Shibata
- Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yoshimasa Kawazoe
- Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
26
|
Noori A, Magdamo C, Liu X, Tyagi T, Li Z, Kondepudi A, Alabsi H, Rudmann E, Wilcox D, Brenner L, Robbins GK, Moura L, Zafar S, Benson NM, Hsu J, R Dickson J, Serrano-Pozo A, Hyman BT, Blacker D, Westover MB, Mukerji SS, Das S. Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study. J Med Internet Res 2022; 24:e40384. [PMID: 36040790 PMCID: PMC9472045 DOI: 10.2196/40384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/29/2022] [Accepted: 07/31/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable. OBJECTIVE The aim of this study was to evaluate whether natural language processing (NLP)-powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status. METHODS In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated. RESULTS NAT adjudication provided higher interrater agreement (Cohen κ=0.89 vs κ=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen κ=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features. CONCLUSIONS NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs.
Collapse
Affiliation(s)
- Ayush Noori
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Colin Magdamo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Xiao Liu
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Tanish Tyagi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Zhaozhi Li
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Akhil Kondepudi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Haitham Alabsi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Emily Rudmann
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Vaccine and Immunotherapy Center, Division of Infectious Disease, Boston, MA, United States
| | - Douglas Wilcox
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Laura Brenner
- Harvard Medical School, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Gregory K Robbins
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Lidia Moura
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Sahar Zafar
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Nicole M Benson
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
- McLean Hospital, Belmont, MA, United States
| | - John Hsu
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
| | - John R Dickson
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Alberto Serrano-Pozo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Bradley T Hyman
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Deborah Blacker
- Harvard Medical School, Boston, MA, United States
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States
| | - M Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Shibani S Mukerji
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| |
Collapse
|
27
|
Tedeschi SK, Huang W, Yoshida K, Solomon DH. Risk of cardiovascular events in patients having had acute calcium pyrophosphate crystal arthritis. Ann Rheum Dis 2022; 81:1323-1329. [PMID: 35613842 PMCID: PMC10043830 DOI: 10.1136/annrheumdis-2022-222387] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 05/16/2022] [Indexed: 11/03/2022]
Abstract
OBJECTIVES Calcium pyrophosphate deposition (CPPD) disease, broadly defined, has been associated with increased risk of cardiovascular (CV) events. We investigated risk of CV events in patients with acute CPP crystal arthritis, the acute manifestation of CPPD. METHODS Cohort study using Mass General Brigham electronic health record (EHR) data, 1991-2017. Patients with acute CPP crystal arthritis were identified using a published machine learning algorithm with positive predictive value 81%. Comparators were matched on year of EHR entry and index date of patients with acute CPP crystal arthritis (first positive synovial fluid CPP result or mention of 'pseudogout', or matched encounter). Major adverse cardiovascular event (MACE) was a composite of non-fatal CV event (myocardial infarction, acute coronary syndrome, coronary revascularisation, stroke) and death. We estimated incidence rates (IRs) and adjusted hazard ratios for MACE, non-fatal CV event and death, allowing for differential estimates during years 0-2 and 2-10. Sensitivity analyses included: (1) patients with acute CPP crystal arthritis diagnosed during outpatient visits, (2) patients with linked Medicare data, 2007-2016 and (3)patients matched on number of CV risk factors. RESULTS We matched 1200 acute CPP crystal arthritis patients to 3810 comparators. IR for MACE in years 0-2 was 91/1000 person-years (p-y) in acute CPP crystal arthritis and 59/1000 p-y in comparators. In years 2-10, IR for MACE was 58/1000 p-y in acute CPP crystal arthritis and 53/1000 p-y in comparators. Acute CPP crystal arthritis was significantly associated with increased risk for MACE in years 0-2 (HR 1.32, 95% CI 1.01 to 1.73) and non-fatal CV event in years 0-2 (HR 1.92, 95% CI 1.12 to 3.28) and years 2-10 (HR 2.18, 95% CI 1.27 to 3.75), but not death. Results of sensitivity analyses were similar to the primary analysis; in the outpatient-only analysis, risk of non-fatal CVE was significantly elevated in years 2-10 but not in years 0-2. CONCLUSIONS Acute CPP crystal arthritis was significantly associated with elevated short and long-term risk for non-fatal CV event.
Collapse
Affiliation(s)
- Sara K Tedeschi
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Weixing Huang
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Kazuki Yoshida
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Daniel H Solomon
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
28
|
Krantz MS, Kerchberger VE, Wei WQ. Novel Analysis Methods to Mine Immune-Mediated Phenotypes and Find Genetic Variation Within the Electronic Health Record (Roadmap for Phenotype to Genotype: Immunogenomics). THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2022; 10:1757-1762. [PMID: 35487368 PMCID: PMC9624141 DOI: 10.1016/j.jaip.2022.04.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 06/14/2023]
Abstract
The field of immunogenomics has the opportunity for accelerated genetic discovery aided by the maturation of electronic health records (EHRs) linked to DNA biobanks. Novel analysis methods in deep phenotyping of EHR data allow the full realization of the paired and increasingly dense genetic/phenotypic information available. This enables researchers to uncover genetic risk factors for the prevention and optimal treatment of immune-mediated diseases and immune-mediated adverse drug reactions. This article reviews the background of EHRs linked to DNA biobanks, potential applications to immunogenomic discovery, and current and emerging techniques in EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Matthew S Krantz
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tenn.
| | - V Eric Kerchberger
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tenn; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tenn
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tenn
| |
Collapse
|
29
|
Li S, Wang Z, Vieira LA, Zheutlin AB, Ru B, Schadt E, Wang P, Copperman AB, Stone JL, Gross SJ, Kao YH, Lau YK, Dolan SM, Schadt EE, Li L. Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data. NPJ Digit Med 2022; 5:68. [PMID: 35668134 PMCID: PMC9170686 DOI: 10.1038/s41746-022-00612-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 05/19/2022] [Indexed: 11/15/2022] Open
Abstract
Preeclampsia is a heterogeneous and complex disease associated with rising morbidity and mortality in pregnant women and newborns in the US. Early recognition of patients at risk is a pressing clinical need to reduce the risk of adverse outcomes. We assessed whether information routinely collected in electronic medical records (EMR) could enhance the prediction of preeclampsia risk beyond what is achieved in standard of care assessments. We developed a digital phenotyping algorithm to curate 108,557 pregnancies from EMRs across the Mount Sinai Health System, accurately reconstructing pregnancy journeys and normalizing these journeys across different hospital EMR systems. We then applied machine learning approaches to a training dataset (N = 60,879) to construct predictive models of preeclampsia across three major pregnancy time periods (ante-, intra-, and postpartum). The resulting models predicted preeclampsia with high accuracy across the different pregnancy periods, with areas under the receiver operating characteristic curves (AUC) of 0.92, 0.82, and 0.89 at 37 gestational weeks, intrapartum and postpartum, respectively. We observed comparable performance in two independent patient cohorts. While our machine learning approach identified known risk factors of preeclampsia (such as blood pressure, weight, and maternal age), it also identified other potential risk factors, such as complete blood count related characteristics for the antepartum period. Our model not only has utility for earlier identification of patients at risk for preeclampsia, but given the prediction accuracy exceeds what is currently achieved in clinical practice, our model provides a path for promoting personalized precision therapeutic strategies for patients at risk.
Collapse
Affiliation(s)
| | | | - Luciana A Vieira
- Department of Obstetrics, Gynecology, and Reproductive Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | | | - Pei Wang
- Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alan B Copperman
- Sema4, Stamford, CT, USA.,Department of Obstetrics, Gynecology, and Reproductive Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Reproductive Endocrinology and Infertility, Reproductive Medicine associates of New York, New York, NY, USA
| | - Joanne L Stone
- Department of Obstetrics, Gynecology, and Reproductive Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Susan J Gross
- Sema4, Stamford, CT, USA.,Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | - Siobhan M Dolan
- Department of Obstetrics, Gynecology, and Reproductive Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Sema4, Stamford, CT, USA. .,Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Li Li
- Sema4, Stamford, CT, USA. .,Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
30
|
Integration of Omics and Phenotypic Data for Precision Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2486:19-35. [PMID: 35437716 DOI: 10.1007/978-1-0716-2265-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Over the past two decades, biomedical research is moving toward a big-data-driven approach. The underlying causes of this transition include the ability to gather genetic or molecular profiles of humans faster, the increasing adoption of electronic health record (EHR) system, and the growing interest in linking omics and phenotypic data for analysis. The integration of individual's biology data (e.g., genomics, proteomics, metabolomics), and health-care data has created unprecedented opportunities for precision medicine, that is, a medical model that uses a patient's unique information, mainly genetic, to prevent, diagnose, or treat disease. This chapter reviewed the research opportunities and applications of integrating omics and phenotypic data for precision medicine, such as understanding the relationship between genotype and phenotype, disease subtyping, and diagnosis or prediction of adverse outcomes. We reviewed the recent advanced methods, particularly the machine learning and deep learning-based approaches used for harnessing and harmonizing the multiomics and phenotypic data to address these applications. We finally discussed the challenges and future directions.
Collapse
|
31
|
Zhu VJ, Lenert LA, Barth KS, Simpson KN, Li H, Kopscik M, Brady KT. Automatically identifying opioid use disorder in non-cancer patients on chronic opioid therapy. Health Informatics J 2022; 28:14604582221107808. [PMID: 35726687 PMCID: PMC10826411 DOI: 10.1177/14604582221107808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background: Using the International Classification of Diseases (ICD) codes alone to record opioid use disorder (OUD) may not completely document OUD in the electronic health record (EHR). We developed and evaluated natural language processing (NLP) approaches to identify OUD from the clinal note. We explored the concordance between ICD-coded and NLP-identified OUD.Methods: We studied EHRs from 13,654 (female: 8223; male: 5431) adult non-cancer patients who received chronic opioid therapy (COT) and had at least one clinical note between 2013 and 2018. Of eligible patients, we randomly selected 10,218 (75%) patients as the training set and the remaining 3436 patients (25%) as the test dataset for NLP approaches.Results: We generated 539 terms representing OUD mentions in clinical notes (e.g., "opioid use disorder," "opioid abuse," "opioid dependence," "opioid overdose") and 73 terms representing OUD medication treatments. By domain expert manual review for the test dataset, our NLP approach yielded high performance: 98.5% for precision, 100% for recall, and 99.2% for F-measure. The concordance of these NLP and ICD identified OUD was modest (Kappa = 0.63).Conclusions: Our NLP approach can accurately identify OUD patients from clinical notes. The combined use of ICD diagnostic code and NLP approach can improve OUD identification.
Collapse
Affiliation(s)
- Vivienne J Zhu
- Biomedical Informatics Center, Department of Public Health Science, College of Medicine, 2345Medical University of South Carolina, Charleston, SC, USA
| | - Leslie A Lenert
- Biomedical Informatics Center, Department of Public Health Science, College of Medicine, 2345Medical University of South Carolina, Charleston, SC, USA
| | - Kelly S Barth
- Department of Psychiatry and Behavioral Science, College of Medicine, 2345Medical University of South Carolina, Charleston, SC, USA
| | - Kit N Simpson
- Department of Healthcare Leadership and Management, College of Health Professions, 2345Medical University of South Carolina, Charleston, SC, USA
| | - Hong Li
- Department of Public Health Science, College of Medicine, 2345Medical University of South Carolina, Charleston, SC, USA
| | - Michael Kopscik
- College of Medicine, 2345Medical University of South Carolina, Charleston, SC, USA
| | - Kathleen T Brady
- Department of Psychiatry and Behavioral Science, College of Medicine, 2345Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
32
|
Almowil Z, Zhou SM, Brophy S, Croxall J. Concept Libraries for Repeatable and Reusable Research: Qualitative Study Exploring the Needs of Users. JMIR Hum Factors 2022; 9:e31021. [PMID: 35289755 PMCID: PMC8965669 DOI: 10.2196/31021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 11/17/2021] [Accepted: 12/05/2021] [Indexed: 12/05/2022] Open
Abstract
Background Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective This study aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library). Methods This was a qualitative study using interviews and focus group discussion. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (N=6) to explore their specific needs in the development of a concept library. In addition, a focus group discussion with researchers (N=14) working with the Secured Anonymized Information Linkage databank, a national eHealth data linkage infrastructure, was held to perform a SWOT (strengths, weaknesses, opportunities, and threats) analysis for the phenotyping system and the proposed concept library. The interviews and focus group discussion were transcribed verbatim, and 2 thematic analyses were performed. Results Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and reusing the work of others. The participants suggested some developments that they would like to see to improve reproducible research output using routine data. Conclusions The study indicated that most interviewees valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group discussion revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library.
Collapse
Affiliation(s)
- Zahra Almowil
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| | - Shang-Ming Zhou
- Centre For Health Technology, Faculty of Health, University of Plymouth, Plymouth, United Kingdom
| | - Sinead Brophy
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| | - Jodie Croxall
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| |
Collapse
|
33
|
Zhang J, Drawz PE, Zhu Y, Hultman G, Simon G, Melton GB. Validation of Administrative Coding and Clinical Notes for Hospital-Acquired Acute Kidney Injury in Adults. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:1234-1243. [PMID: 35308921 PMCID: PMC8861756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Acute kidney injury (AKI) is potentially catastrophic and commonly seen among inpatients. In the United States, the quality of administrative coding data for capturing AKI accurately is questionable and needs to be updated. This retrospective study validated the quality of administrative coding for hospital-acquired AKI and explored the opportunities to improve the phenotyping performance by utilizing additional data sources from the electronic health record (EHR). A total of34570 patients were included, and overall prevalence of AKI based on the KDIGO reference standard was 10.13%, We obtained significantly different quality measures (sensitivity.-0.486, specificity:0.947, PPV.0.509, NPV:0.942 in the full cohort) of administrative coding from the previously reported ones in the U.S. Additional use of clinical notes by incorporating automatic NLP data extraction has been found to increase the AUC in phenotyping AKI, and AKI was better recognized in patients with heart failure, indicating disparities in the coding and management of AKI.
Collapse
Affiliation(s)
| | | | - Ying Zhu
- Institute for Health Informatics
| | | | | | - Genevieve B Melton
- Institute for Health Informatics
- Department of Surgery, University of Minnesota Minneapolis, MN, USA
| |
Collapse
|
34
|
Dashti HS, Miranda N, Cade BE, Huang T, Redline S, Karlson EW, Saxena R. Interaction of obesity polygenic score with lifestyle risk factors in an electronic health record biobank. BMC Med 2022; 20:5. [PMID: 35016652 PMCID: PMC8753909 DOI: 10.1186/s12916-021-02198-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic and lifestyle factors have considerable effects on obesity and related diseases, yet their effects in a clinical cohort are unknown. This study in a patient biobank examined associations of a BMI polygenic risk score (PRS), and its interactions with lifestyle risk factors, with clinically measured BMI and clinical phenotypes. METHODS The Mass General Brigham (MGB) Biobank is a hospital-based cohort with electronic health record, genetic, and lifestyle data. A PRS for obesity was generated using 97 genetic variants for BMI. An obesity lifestyle risk index using survey responses to obesogenic lifestyle risk factors (alcohol, education, exercise, sleep, smoking, and shift work) was used to dichotomize the cohort into high and low obesogenic index based on the population median. Height and weight were measured at a clinical visit. Multivariable linear cross-sectional associations of the PRS with BMI and interactions with the obesity lifestyle risk index were conducted. In phenome-wide association analyses (PheWAS), similar logistic models were conducted for 675 disease outcomes derived from billing codes. RESULTS Thirty-three thousand five hundred eleven patients were analyzed (53.1% female; age 60.0 years; BMI 28.3 kg/m2), of which 17,040 completed the lifestyle survey (57.5% female; age: 60.2; BMI: 28.1 (6.2) kg/m2). Each standard deviation increment in the PRS was associated with 0.83 kg/m2 unit increase in BMI (95% confidence interval (CI) =0.76, 0.90). There was an interaction between the obesity PRS and obesity lifestyle risk index on BMI. The difference in BMI between those with a high and low obesogenic index was 3.18 kg/m2 in patients in the highest decile of PRS, whereas that difference was only 1.55 kg/m2 in patients in the lowest decile of PRS. In PheWAS, the obesity PRS was associated with 40 diseases spanning endocrine/metabolic, circulatory, and 8 other disease groups. No interactions were evident between the PRS and the index on disease outcomes. CONCLUSIONS In this hospital-based clinical biobank, obesity risk conferred by common genetic variants was associated with elevated BMI and this risk was attenuated by a healthier patient lifestyle. Continued consideration of the role of lifestyle in the context of genetic predisposition in healthcare settings is necessary to quantify the extent to which modifiable lifestyle risk factors may moderate genetic predisposition and inform clinical action to achieve personalized medicine.
Collapse
Affiliation(s)
- Hassan S Dashti
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. .,Broad Institute, Cambridge, MA, USA. .,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Nicole Miranda
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Brian E Cade
- Broad Institute, Cambridge, MA, USA.,Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Tianyi Huang
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Elizabeth W Karlson
- Mass General Brigham Personalized Medicine, Mass General Brigham HealthCare, Boston, MA, USA.,Department of Medicines, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.,Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Richa Saxena
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.,Broad Institute, Cambridge, MA, USA.,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
35
|
Yang X, Liu N, Qiao J, Yuan H, Ma T, Xu Y, Cui L. Clinical Phenotyping Prediction via Auxiliary Task Selection and Adaptive Shared-Space Correction. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20500-2_36] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
36
|
Kim HN, Gupta A, Lan K, Stewart J, Dhanireddy S, Corcorran MA. Diagnostic accuracy of ICD code versus discharge summary-based query for endocarditis cohort identification. Medicine (Baltimore) 2021; 100:e28354. [PMID: 34941148 PMCID: PMC8702270 DOI: 10.1097/md.0000000000028354] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 12/01/2021] [Indexed: 02/06/2023] Open
Abstract
Studies of infective endocarditis (IE) have relied on International Classification of Disease (ICD) codes to identify cases, a method vulnerable to misclassification. Clinical narrative data could offer greater accuracy and richness to cohort identification. We evaluated two algorithms: 1. a standard query of ICD-9/10 billing codes, with or without procedure codes for echocardiogram and 2. a text query of discharge summaries (DS) that selected on the term “endocarditis” in fields headed by “Discharge Diagnosis” or “Admission Diagnosis” or similar. Further coding extracted valve involved and organism responsible if present. All cases were chart reviewed using pre-specified criteria. Positive predictive value (PPV), sensitivity and specificity were calculated. The ICD-based query identified 612 individuals from July 2015 to July 2019 who had a hospital billing code for infective endocarditis; of these, 534 had an echocardiogram. The DS query identified 387 cases. PPV for the DS query was 84.5% (95% CI 80.6%, 87.8%) compared with 72.4% (95% CI 68.7%, 75.8%) for ICD only (P < .001) and 75.8% (95% CI 72.0%, 79.3%) for ICD + echo queries (P = .002). Sensitivity was 75.9% for DS query and 86.8% to 93.4% for ICD queries (P < .02 for these comparisons). Specificity was high for all queries >94%. The DS query also yielded valve data (prosthetic, tricuspid, aortic, etc) in 60% and microbiologic agent in 73% of identified cases with an accuracy of 94% and 90%, respectively when assessed by chart review. Compared with ICD-based queries, text-based queries of discharge summaries have the potential to improve precision of IE case ascertainment and extract key clinical variables.
Collapse
|
37
|
Shaw DM, Polikowsky HP, Pruett DG, Chen HH, Petty LE, Viljoen KZ, Beilby JM, Jones RM, Kraft SJ, Below JE. Phenome risk classification enables phenotypic imputation and gene discovery in developmental stuttering. Am J Hum Genet 2021; 108:2271-2283. [PMID: 34861174 PMCID: PMC8715184 DOI: 10.1016/j.ajhg.2021.11.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 10/27/2021] [Indexed: 11/30/2022] Open
Abstract
Developmental stuttering is a speech disorder characterized by disruption in the forward movement of speech. This disruption includes part-word and single-syllable repetitions, prolongations, and involuntary tension that blocks syllables and words, and the disorder has a life-time prevalence of 6-12%. Within Vanderbilt's electronic health record (EHR)-linked biorepository (BioVU), only 142 individuals out of 92,762 participants (0.15%) are identified with diagnostic ICD9/10 codes, suggesting a large portion of people who stutter do not have a record of diagnosis within the EHR. To identify individuals affected by stuttering within our EHR, we built a PheCode-driven Gini impurity-based classification and regression tree model, PheML, by using comorbidities enriched in individuals affected by stuttering as predicting features and imputing stuttering status as the outcome variable. Applying PheML in BioVU identified 9,239 genotyped affected individuals (a clinical prevalence of ∼10%) for downstream genetic analysis. Ancestry-stratified GWAS of PheML-imputed affected individuals and matched control individuals identified rs12613255, a variant near CYRIA on chromosome 2 (B = 0.323; p value = 1.31 × 10-8) in European-ancestry analysis and rs7837758 (B = 0.518; p value = 5.07 × 10-8), an intronic variant found within the ZMAT4 gene on chromosome 8, in African-ancestry analysis. Polygenic-risk prediction and concordance analysis in an independent clinically ascertained sample of developmental stuttering cases validate our GWAS findings in PheML-imputed affected and control individuals and demonstrate the clinical relevance of our population-based analysis for stuttering risk.
Collapse
Affiliation(s)
- Douglas M Shaw
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Hannah P Polikowsky
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Dillon G Pruett
- Hearing and Speech Sciences, Vanderbilt University, Nashville, TN 37203, USA
| | - Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Lauren E Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Kathryn Z Viljoen
- Curtin School of Allied Health, Curtin University, Perth 6845, Australia
| | - Janet M Beilby
- Curtin School of Allied Health, Curtin University, Perth 6845, Australia
| | - Robin M Jones
- Hearing and Speech Sciences, Vanderbilt University, Nashville, TN 37203, USA
| | - Shelly Jo Kraft
- Communication Sciences and Disorders, Wayne State University, Detroit, MI 48202, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
| |
Collapse
|
38
|
Bai P, Barkmeier AJ, Hodge DO, Mohney BG. Ocular Sequelae in a Population-Based Cohort of Youth Diagnosed With Diabetes During a 50-Year Period. JAMA Ophthalmol 2021; 140:51-57. [PMID: 34854892 DOI: 10.1001/jamaophthalmol.2021.5052] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Importance Despite the increasing prevalence of type 2 diabetes (T2D) diagnosed in childhood, little is known about the natural history of ocular sequelae in youth-onset T2D compared with type 1 diabetes (T1D). Objective To assess the risk of developing diabetes-associated ocular complications among youth diagnosed with diabetes. Design, Setting, and Participants This retrospective, population-based medical record review included all residents of Olmsted County, Minnesota (95.7% White in 1990), diagnosed with diabetes at younger than 22 years (hereinafter referred to as children) from January 1, 1970, through December 31, 2019. Main Outcomes and Measures Risk of developing ocular complications over time. Results Among 1362 individuals with a diagnostic code of diabetes, medical record reviews confirmed a diagnosis of T1D or T2D in 606 children, of whom 525 (86.6%) underwent at least 1 eye examination (mean [SD] age at diabetes diagnosis, 12.1 [5.4] years; 264 [50.3%] male). Diabetes-associated ocular complications occurred in 147 of the 461 children (31.2%) with T1D and in 17 of the 64 children (26.6%) with T2D. The hazard ratio illustrating the risk between T2D and T1D rates was 1.88 (95% CI, 1.13-3.12; P = .02) for developing any diabetic retinopathy (nonproliferative or greater), 2.33 (95% CI, 0.99-5.50; P = .048) for proliferative diabetic retinopathy, 1.49 (95% CI, 0.46-4.89; P = .50) for diabetic macular edema, 2.43 (95% CI, 0.54-11.07; P = .24) for a visually significant cataract, and 4.06 (95% CI, 1.34-12.33; P = .007) for requiring pars plana vitrectomy by 15 years after the diagnosis of diabetes. Conclusions and Relevance Diabetic retinopathy, proliferative diabetic retinopathy, and the need for pars plana vitrectomy occurred within a shorter diabetes duration for children with T2D compared with T1D in this population-based cohort. Children with T2D had almost twice the risk of developing retinopathy compared with those with T1D. These findings suggest that to prevent serious ocular complications, children with T2D may require ophthalmoscopic evaluations at least as frequently as or more frequently than children with T1D.
Collapse
Affiliation(s)
- Patricia Bai
- Alix School of Medicine, Mayo Clinic, Phoenix, Arizona
| | | | - David O Hodge
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, Florida
| | - Brian G Mohney
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
39
|
d'Errico A, Strippoli E, Vasta R, Ferrante G, Spila Alegiani S, Ricceri F. Use of antipsychotics and long-term risk of parkinsonism. Neurol Sci 2021; 43:2545-2553. [PMID: 34652577 PMCID: PMC8918175 DOI: 10.1007/s10072-021-05650-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 08/01/2021] [Indexed: 11/29/2022]
Abstract
INTRODUCTION Few epidemiological studies have assessed the risk of parkinsonisms after prolonged use of neuroleptics. We aimed to examine the long-term risk of degenerative parkinsonisms (DP) associated with previous use of neuroleptics. METHODS All residents in Piedmont, Northern-west Italy, older than 39 years (2,526,319 subjects), were retrospectively followed up from 2013 to 2017. Exposure to neuroleptics was assessed through the regional archive of drug prescriptions. The development of DP was assessed using the regional archives of both drug prescriptions and hospital admissions. We excluded prevalent DP cases at baseline as well as those occurred in the first 18 months (short-term risk). The risk of DP associated with previous use of neuroleptics was examined through Cox regression, using a matched cohort design. RESULTS The risk of DP was compared between 63,356 exposed and 316,779 unexposed subjects. A more than threefold higher risk of DP was observed among subjects exposed to antipsychotics, compared to those unexposed (HR = 3.27, 95% CI 3.00-3.57), and was higher for exposure to atypical than typical antipsychotics. The risk decreased after 2 years from therapy cessation but remained significantly elevated (HR = 2.38, 95% CI 1.76-3.21). CONCLUSIONS These results indicate a high risk of developing DP long time from the start of use and from the cessation for both typical and atypical neuroleptics, suggesting the need of monitoring treated patients even after long-term use and cessation.
Collapse
Affiliation(s)
- Angelo d'Errico
- Epidemiology Unit, Piedmont Region, ASL TO3, Grugliasco, Italy
| | - Elena Strippoli
- Epidemiology Unit, Piedmont Region, ASL TO3, Grugliasco, Italy
| | - Rosario Vasta
- ALS Center, 'Rita Levi Montalcini' Department of Neuroscience, University of Turin, Via Cherasco, 15, 10126, Turin, Italy.
| | - Gianluigi Ferrante
- National Centre for Drug Research and Evaluation, National Institute of Health (ISS), Rome, Italy.,Center for Oncology Prevention Piemonte, Città della Salute e della Scienza, Turin, Italy
| | - Stefania Spila Alegiani
- National Centre for Drug Research and Evaluation, National Institute of Health (ISS), Rome, Italy
| | - Fulvio Ricceri
- Epidemiology Unit, Piedmont Region, ASL TO3, Grugliasco, Italy.,Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
| |
Collapse
|
40
|
An independently validated, portable algorithm for the rapid identification of COPD patients using electronic health records. Sci Rep 2021; 11:19959. [PMID: 34620889 PMCID: PMC8497529 DOI: 10.1038/s41598-021-98719-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 08/25/2021] [Indexed: 11/24/2022] Open
Abstract
Electronic health records (EHR) provide an unprecedented opportunity to conduct large, cost-efficient, population-based studies. However, the studies of heterogeneous diseases, such as chronic obstructive pulmonary disease (COPD), often require labor-intensive clinical review and testing, limiting widespread use of these important resources. To develop a generalizable and efficient method for accurate identification of large COPD cohorts in EHRs, a COPD datamart was developed from 3420 participants meeting inclusion criteria in the Mass General Brigham Biobank. Training and test sets were selected and labeled with gold-standard COPD classifications obtained from chart review by pulmonologists. Multiple classes of algorithms were built utilizing both structured (e.g. ICD codes) and unstructured (e.g. medical notes) data via elastic net regression. Models explicitly including and excluding spirometry features were compared. External validation of the final algorithm was conducted in an independent biobank with a different EHR system. The final COPD classification model demonstrated excellent positive predictive value (PPV; 91.7%), sensitivity (71.7%), and specificity (94.4%). This algorithm performed well not only within the MGBB, but also demonstrated similar or improved classification performance in an independent biobank (PPV 93.5%, sensitivity 61.4%, specificity 90%). Ancillary comparisons showed that the classification model built including a binary feature for FEV1/FVC produced substantially higher sensitivity than those excluding. This study fills a gap in COPD research involving population-based EHRs, providing an important resource for the rapid, automated classification of COPD cases that is both cost-efficient and requires minimal information from unstructured medical records.
Collapse
|
41
|
De Freitas JK, Johnson KW, Golden E, Nadkarni GN, Dudley JT, Bottinger EP, Glicksberg BS, Miotto R. Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records. PATTERNS (NEW YORK, N.Y.) 2021; 2:100337. [PMID: 34553174 PMCID: PMC8441576 DOI: 10.1016/j.patter.2021.100337] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/30/2021] [Accepted: 08/05/2021] [Indexed: 11/23/2022]
Abstract
Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
Collapse
Affiliation(s)
- Jessica K. De Freitas
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Kipp W. Johnson
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Eddye Golden
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Girish N. Nadkarni
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Joel T. Dudley
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Erwin P. Bottinger
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Digital Health Center at Hasso Plattner Institute, University of Potsdam, Professor-Dr.-Helmert-Str 2–3, 14482 Potsdam, Germany
| | - Benjamin S. Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Riccardo Miotto
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| |
Collapse
|
42
|
Somani S, Yoffie S, Teng S, Havaldar S, Nadkarni GN, Zhao S, Glicksberg BS. Development and validation of techniques for phenotyping ST-elevation myocardial infarction encounters from electronic health records. JAMIA Open 2021; 4:ooab068. [PMID: 34423260 PMCID: PMC8374370 DOI: 10.1093/jamiaopen/ooab068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 06/07/2021] [Accepted: 07/29/2021] [Indexed: 11/12/2022] Open
Abstract
Objectives Classifying hospital admissions into various acute myocardial infarction phenotypes in electronic health records (EHRs) is a challenging task with strong research implications that remains unsolved. To our knowledge, this study is the first study to design and validate phenotyping algorithms using cardiac catheterizations to identify not only patients with a ST-elevation myocardial infarction (STEMI), but the specific encounter when it occurred. Materials and Methods We design and validate multi-modal algorithms to phenotype STEMI on a multicenter EHR containing 5.1 million patients and 115 million patient encounters by using discharge summaries, diagnosis codes, electrocardiography readings, and the presence of cardiac catheterizations on the encounter. Results We demonstrate that robustly phenotyping STEMIs by selecting discharge summaries containing “STEM” has the potential to capture the most number of STEMIs (positive predictive value [PPV] = 0.36, N = 2110), but that addition of a STEMI-related International Classification of Disease (ICD) code and cardiac catheterizations to these summaries yields the highest precision (PPV = 0.94, N = 952). Discussion and Conclusion In this study, we demonstrate that the incorporation of percutaneous coronary intervention increases the PPV for detecting STEMI-related patient encounters from the EHR.
Collapse
Affiliation(s)
- Sulaiman Somani
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Stephen Yoffie
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shelly Teng
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shreyas Havaldar
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Girish N Nadkarni
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shan Zhao
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Benjamin S Glicksberg
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
43
|
Meng Y, Speier W, Ong MK, Arnold CW. Bidirectional Representation Learning From Transformers Using Multimodal Electronic Health Record Data to Predict Depression. IEEE J Biomed Health Inform 2021; 25:3121-3129. [PMID: 33661740 DOI: 10.1109/jbhi.2021.3063721] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Advancements in machine learning algorithms have had a beneficial impact on representation learning, classification, and prediction models built using electronic health record (EHR) data. Effort has been put both on increasing models' overall performance as well as improving their interpretability, particularly regarding the decision-making process. In this study, we present a temporal deep learning model to perform bidirectional representation learning on EHR sequences with a transformer architecture to predict future diagnosis of depression. This model is able to aggregate five heterogenous and high-dimensional data sources from the EHR and process them in a temporal manner for chronic disease prediction at various prediction windows. We applied the current trend of pretraining and fine-tuning on EHR data to outperform the current state-of-the-art in chronic disease prediction, and to demonstrate the underlying relation between EHR codes in the sequence. The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model. Furthermore, the self-attention weights in each sequence quantitatively demonstrated the inner relationship between various codes, which improved the model's interpretability. These results demonstrate the model's ability to utilize heterogeneous EHR data to predict depression while achieving high accuracy and interpretability, which may facilitate constructing clinical decision support systems in the future for chronic disease screening and early detection.
Collapse
|
44
|
Abstract
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.
Collapse
Affiliation(s)
- Bethany Percha
- Department of Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10025, USA;
| |
Collapse
|
45
|
Holmes JH, Beinlich J, Boland MR, Bowles KH, Chen Y, Cook TS, Demiris G, Draugelis M, Fluharty L, Gabriel PE, Grundmeier R, Hanson CW, Herman DS, Himes BE, Hubbard RA, Kahn CE, Kim D, Koppel R, Long Q, Mirkovic N, Morris JS, Mowery DL, Ritchie MD, Urbanowicz R, Moore JH. Why Is the Electronic Health Record So Challenging for Research and Clinical Care? Methods Inf Med 2021; 60:32-48. [PMID: 34282602 DOI: 10.1055/s-0041-1731784] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
BACKGROUND The electronic health record (EHR) has become increasingly ubiquitous. At the same time, health professionals have been turning to this resource for access to data that is needed for the delivery of health care and for clinical research. There is little doubt that the EHR has made both of these functions easier than earlier days when we relied on paper-based clinical records. Coupled with modern database and data warehouse systems, high-speed networks, and the ability to share clinical data with others are large number of challenges that arguably limit the optimal use of the EHR OBJECTIVES: Our goal was to provide an exhaustive reference for those who use the EHR in clinical and research contexts, but also for health information systems professionals as they design, implement, and maintain EHR systems. METHODS This study includes a panel of 24 biomedical informatics researchers, information technology professionals, and clinicians, all of whom have extensive experience in design, implementation, and maintenance of EHR systems, or in using the EHR as clinicians or researchers. All members of the panel are affiliated with Penn Medicine at the University of Pennsylvania and have experience with a variety of different EHR platforms and systems and how they have evolved over time. RESULTS Each of the authors has shared their knowledge and experience in using the EHR in a suite of 20 short essays, each representing a specific challenge and classified according to a functional hierarchy of interlocking facets such as usability and usefulness, data quality, standards, governance, data integration, clinical care, and clinical research. CONCLUSION We provide here a set of perspectives on the challenges posed by the EHR to clinical and research users.
Collapse
Affiliation(s)
- John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - James Beinlich
- Information Technology Entity Services and Corporate Information Services, University of Pennsylvania Health System, Philadelphia, Pennsylvania, United States
| | - Mary R Boland
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Kathryn H Bowles
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, United States
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Tessa S Cook
- Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - George Demiris
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, United States
| | - Michael Draugelis
- Department of Predictive Health Care, University of Pennsylvania Health System, Philadelphia, Pennsylvania, United States
| | - Laura Fluharty
- Clinical Research Operations, University of Pennsylvania Health System, Philadelphia, Pennsylvania, United States
| | - Peter E Gabriel
- Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Robert Grundmeier
- Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States
| | - C William Hanson
- Department of Anesthesiology and Critical Care, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Daniel S Herman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine Philadelphia, Pennsylvania, United States
| | - Blanca E Himes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Charles E Kahn
- Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Ross Koppel
- Department of Sociology, University of Pennsylvania, Philadelphia, Pennsylvania, United States
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Nebojsa Mirkovic
- Department of Research Analytics, University of Pennsylvania Health System, Philadelphia, Pennsylvania, United States
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Ryan Urbanowicz
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| |
Collapse
|
46
|
Chandler PD, Clark CR, Zhou G, Noel NL, Achilike C, Mendez L, Ramirez AH, Loperena-Cortes R, Mayo K, Cohn E, Ohno-Machado L, Boerwinkle E, Cicek M, Qian J, Schully S, Ratsimbazafy F, Mockrin S, Gebo K, Dedier JJ, Murphy SN, Smoller JW, Karlson EW. Hypertension prevalence in the All of Us Research Program among groups traditionally underrepresented in medical research. Sci Rep 2021; 11:12849. [PMID: 34158555 PMCID: PMC8219813 DOI: 10.1038/s41598-021-92143-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 06/04/2021] [Indexed: 11/18/2022] Open
Abstract
The All of Us Research Program was designed to enable broad-based precision medicine research in a cohort of unprecedented scale and diversity. Hypertension (HTN) is a major public health concern. The validity of HTN data and definition of hypertension cases in the All of Us (AoU) Research Program for use in rule-based algorithms is unknown. In this cross-sectional, population-based study, we compare HTN prevalence in the AoU Research Program to HTN prevalence in the 2015-2016 National Health and Nutrition Examination Survey (NHANES). We used AoU baseline data from patient (age ≥ 18) measurements (PM), surveys, and electronic health record (EHR) blood pressure measurements. We retrospectively examined the prevalence of HTN in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED) codes and blood pressure medications recorded in the EHR. We defined HTN as the participant having at least 2 HTN diagnosis/billing codes on separate dates in the EHR data AND at least one HTN medication. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, and ≥ 60). Among the 185,770 participants enrolled in the AoU Cohort (mean age at enrollment = 51.2 years) available in a Researcher Workbench as of October 2019, EHR data was available for at least one SNOMED code from 112,805 participants, medications for 104,230 participants, and 103,490 participants had both medication and SNOMED data. The total number of persons with SNOMED codes on at least two distinct dates and at least one antihypertensive medication was 33,310 for a crude prevalence of HTN of 32.2%. AoU age-adjusted HTN prevalence was 27.9% using 3 groups compared to 29.6% in NHANES. The AoU cohort is a growing source of diverse longitudinal data to study hypertension nationwide and develop precision rule-based algorithms for use in hypertension treatment and prevention research. The prevalence of hypertension in this cohort is similar to that in prior population-based surveys.
Collapse
Affiliation(s)
- Paulette D Chandler
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| | - Cheryl R Clark
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Guohai Zhou
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Nyia L Noel
- Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
| | - Confidence Achilike
- Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
| | - Lizette Mendez
- Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
| | | | | | - Kelsey Mayo
- Vanderbilt University Medical Center, Nashville, TN, USA
| | | | | | - Eric Boerwinkle
- University of Texas Health Science Center School of Public Health, Houston, TX, USA
| | | | - Jun Qian
- Vanderbilt University Medical Center, Nashville, TN, USA
| | | | | | | | - Kelly Gebo
- Johns Hopkins University, Baltimore, MD, USA
| | - Julien J Dedier
- Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
| | - Shawn N Murphy
- Research Information Science and Computing, Mass General Brigham, Boston, MA, USA
| | - Jordan W Smoller
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Elizabeth W Karlson
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| |
Collapse
|
47
|
A Large-Scale Observational Study on the Temporal Trends and Risk Factors of Opioid Overdose: Real-World Evidence for Better Opioids. Drugs Real World Outcomes 2021; 8:393-406. [PMID: 34037960 PMCID: PMC8324607 DOI: 10.1007/s40801-021-00253-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2021] [Indexed: 11/25/2022] Open
Abstract
Background The USA is in the midst of an opioid overdose epidemic. To address the epidemic, we conducted a large-scale population study on opioid overdose. Objectives The primary objective of this study was to evaluate the temporal trends and risk factors of inpatient opioid overdose. Based on its patterns, the secondary objective was to examine the innate properties of opioid analgesics underlying reduced overdose effects. Methods A retrospective cross-sectional study was conducted based on a large-scale inpatient electronic health records database, Cerner Health Facts®, with (1) inclusion criteria for participants as patients admitted between 1 January, 2009 and 31 December, 2017 and (2) measurements as opioid overdose prevalence by year, demographics, and prescription opioid exposures. Results A total of 4,720,041 patients with 7,339,480 inpatient encounters were retrieved from Cerner Health Facts®. Among them, 30.2% patients were aged 65+ years, 57.0% female, 70.1% Caucasian, 42.3% single, 32.0% from the South, and 80.8% in an urban area. From 2009 to 2017, annual opioid overdose prevalence per 1000 patients significantly increased from 3.7 to 11.9 with an adjusted odds ratio (aOR): 1.16, 95% confidence interval (CI) 1.15–1.16. Compared to the major demographic counterparts, being in (1) age group: 41–50 years (overall aOR 1.36, 95% CI 1.31–1.40) or 51–64 years (overall aOR 1.35, 95% CI 1.32–1.39), (2) marital status: divorced (overall aOR 1.19, 95% CI 1.15–1.23), and (3) census region: West (overall aOR 1.32, 95% CI 1.28–1.36) were significantly associated with a higher odds of opioid overdose. Prescription opioid exposures were also associated with an increased odds of opioid overdose, such as meperidine (overall aOR 1.09, 95% CI 1.06–1.13) and tramadol (overall aOR 2.20, 95% CI 2.14–2.27). Examination on the relationships between opioid analgesic properties and their association strengths, aORs, and opioid overdose showed that lower aOR values were significantly associated with (1) high molecular weight, (2) non-interaction with multi-drug resistance protein 1 or interaction with cytochrome P450 3A4, and (3) non-interaction with the delta opioid receptor or kappa opioid receptor. Conclusions The significant increasing trends of opioid overdose at the inpatient care setting from 2009 to 2017 suggested an ongoing need for efforts to combat the opioid overdose epidemic in the USA. Risk factors associated with opioid overdose included patient demographics and prescription opioid exposures. Moreover, there are physicochemical, pharmacokinetic, and pharmacodynamic properties underlying reduced overdose effects, which can be utilized to develop better opioids. Supplementary Information The online version contains supplementary material available at 10.1007/s40801-021-00253-8.
Collapse
|
48
|
Moldwin A, Demner-Fushman D, Goodwin TR. Empirical Findings on the Role of Structured Data, Unstructured Data, and their Combination for Automatic Clinical Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:445-454. [PMID: 34457160 PMCID: PMC8378600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The objective of this study is to explore the role of structured and unstructured data for clinical phenotyping by determining which types of clinical phenotypes are best identified using unstructured data (e.g., clinical notes), structured data (e.g., laboratory values, vital signs), or their combination across 172 clinical phenotypes. Specifically, we used laboratory and chart measurements as well as clinical notes from the MIMIC-III critical care database and trained an LSTM using features extracted from each type of data to determine which categories of phenotypes were best identified by structured data, unstructured data, or both. We observed that textual features on their own outperformed structured features for 145 (84%) of phenotypes, and that Doc2Vec was the most effective representation of unstructured data for all phenotypes. When evaluating the impact of adding textual features to systems previously relying only on structured features, we found a statistically significant (p < 0.05) increase in phenotyping performance for 51 phenotypes (primarily involving the circulatory system, injury, and poisoning), one phenotype for which textual features degraded performance (diabetes without complications), and no statistically significant change in performance with the remaining 120 phenotypes. We provide analysis on which phenotypes are best identified by each type of data and guidance on which data sources to consider for future research on phenotype identification.
Collapse
Affiliation(s)
- Asher Moldwin
- U.S. National Library of Medicine, Bethesda, MD, USA
| | | | | |
Collapse
|
49
|
DeLozier S, Bland HT, McPheeters M, Wells Q, Farber-Eger E, Bejan CA, Fabbri D, Rosenbloom T, Roden D, Johnson KB, Wei WQ, Peterson J, Bastarache L. Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort. J Biomed Inform 2021; 117:103777. [PMID: 33838341 PMCID: PMC8026248 DOI: 10.1016/j.jbi.2021.103777] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/09/2021] [Accepted: 04/03/2021] [Indexed: 01/08/2023]
Abstract
From the start of the coronavirus disease 2019 (COVID-19) pandemic, researchers have looked to electronic health record (EHR) data as a way to study possible risk factors and outcomes. To ensure the validity and accuracy of research using these data, investigators need to be confident that the phenotypes they construct are reliable and accurate, reflecting the healthcare settings from which they are ascertained. We developed a COVID-19 registry at a single academic medical center and used data from March 1 to June 5, 2020 to assess differences in population-level characteristics in pandemic and non-pandemic years respectively. Median EHR length, previously shown to impact phenotype performance in type 2 diabetes, was significantly shorter in the SARS-CoV-2 positive group relative to a 2019 influenza tested group (median 3.1 years vs 8.7; Wilcoxon rank sum P = 1.3e-52). Using three phenotyping methods of increasing complexity (billing codes alone and domain-specific algorithms provided by an EHR vendor and clinical experts), common medical comorbidities were abstracted from COVID-19 EHRs, defined by the presence of a positive laboratory test (positive predictive value 100%, recall 93%). After combining performance data across phenotyping methods, we observed significantly lower false negative rates for those records billed for a comprehensive care visit (p = 4e-11) and those with complete demographics data recorded (p = 7e-5). In an early COVID-19 cohort, we found that phenotyping performance of nine common comorbidities was influenced by median EHR length, consistent with previous studies, as well as by data density, which can be measured using portable metrics including CPT codes. Here we present those challenges and potential solutions to creating deeply phenotyped, acute COVID-19 cohorts.
Collapse
Affiliation(s)
- Sarah DeLozier
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA.
| | - Harris T Bland
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Melissa McPheeters
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Quinn Wells
- Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Pierce Avenue, 383 Preston Research Building, Nashville, TN 37232, USA
| | - Eric Farber-Eger
- Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Pierce Avenue, 383 Preston Research Building, Nashville, TN 37232, USA
| | - Cosmin A Bejan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Dan Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA; Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Pierce Avenue, 383 Preston Research Building, Nashville, TN 37232, USA
| | - Kevin B Johnson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Josh Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
| |
Collapse
|
50
|
Dashti HS, Cade BE, Stutaite G, Saxena R, Redline S, Karlson EW. Sleep health, diseases, and pain syndromes: findings from an electronic health record biobank. Sleep 2021; 44:5909423. [PMID: 32954408 DOI: 10.1093/sleep/zsaa189] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 08/28/2020] [Indexed: 02/02/2023] Open
Abstract
STUDY OBJECTIVES Implementation of electronic health record biobanks has facilitated linkage between clinical and questionnaire data and enabled assessments of relationships between sleep health and diseases in phenome-wide association studies (PheWAS). In the Mass General Brigham Biobank, a large health system-based study, we aimed to systematically catalog associations between time in bed, sleep timing, and weekly variability with clinical phenotypes derived from ICD-9/10 codes. METHODS Self-reported habitual bed and wake times were used to derive variables: short (<7 hours) and long (≥9 hours) time in bed, sleep midpoint, social jetlag, and sleep debt. Logistic regression and Cox proportional hazards models were used to test cross-sectional and prospective associations, respectively, adjusted for age, gender, race/ethnicity, and employment status and further adjusted for body mass index. RESULTS In cross-sectional analysis (n = 34,651), sleep variable associations were most notable for circulatory system, mental disorders, and endocrine/metabolic phenotypes. We observed the strongest associations for short time in bed with obesity, for long time in bed and sleep midpoint with major depressive disorder, for social jetlag with hypercholesterolemia, and for sleep debt with acne. In prospective analysis (n = 24,065), we observed short time in bed associations with higher incidence of acute pain and later sleep midpoint and higher sleep debt and social jetlag associations with higher incidence of major depressive disorder. CONCLUSIONS Our analysis reinforced that sleep health is a multidimensional construct, corroborated robust known findings from traditional cohort studies, and supported the application of PheWAS as a promising tool for advancing sleep research. Considering the exploratory nature of PheWAS, careful interrogation of novel findings is imperative.
Collapse
Affiliation(s)
- Hassan S Dashti
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Broad Institute, Cambridge, MA.,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Brian E Cade
- Broad Institute, Cambridge, MA.,Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA
| | - Gerda Stutaite
- Mass General Brigham Personalized Medicine, Mass General Brigham, Boston, MA
| | - Richa Saxena
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Broad Institute, Cambridge, MA.,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA.,Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Elizabeth W Karlson
- Mass General Brigham Personalized Medicine, Mass General Brigham, Boston, MA.,Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA
| |
Collapse
|