1
|
Waxse BJ, Bustos Carrillo FA, Tran TC, Mo H, Ricotta EE, Denny JC. Computable phenotypes to identify respiratory viral infections in the All of Us research program. Sci Rep 2025; 15:18680. [PMID: 40437102 PMCID: PMC12120013 DOI: 10.1038/s41598-025-02183-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Accepted: 05/12/2025] [Indexed: 06/01/2025] Open
Abstract
Electronic health records (EHRs) contain rich temporal data about respiratory viral infections, but methods to identify these infections from EHR data vary widely and lack robust validation. We developed computable phenotypes by integrating virus-specific International Classification of Diseases (ICD) billing codes, prescriptions, and laboratory results within 90-day episodes. Analysis of 265,222 participants with EHR data from the All of Us Research Program yielded national cohorts of varied size: large cohorts for SARS-CoV-2 (n = 28,729) and influenza (n = 19,784); medium cohorts for rhinovirus, human coronavirus, and respiratory syncytial virus (n = 1,161-1,620); and smaller cohorts for the other viruses (n = 238-486). Using laboratory results as a reference standard, phenotypes using virus-specific ICD codes and medications had variable sensitivity (8-67%) but high positive predictive value (PPV, 90-97%) for most viruses, while influenza virus and SARS-CoV-2 phenotypes had lower PPV (69-70%) that improved with the inclusion of additional ICD codes. Identified infections exhibited expected seasonal patterns matching CDC data. This integrated approach identified infections more effectively than individual components alone and demonstrated utility for severe infections in hospital settings. This method enables large-scale studies of host genetics, health disparities, and clinical outcomes across episodic diseases, with flexibility to optimize sensitivity or PPV depending on the specific research question.
Collapse
Affiliation(s)
- Bennett J Waxse
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | - Tam C Tran
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Huan Mo
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emily E Ricotta
- Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
Mandel HL, Shah SN, Bailey LC, Carton T, Chen Y, Esquenazi-Karonika S, Haendel M, Hornig M, Kaushal R, Oliveira CR, Perlowski AA, Pfaff E, Rao S, Razzaghi H, Seibert E, Thomas GL, Weiner MG, Thorpe LE, Divers J. Opportunities and Challenges in Using Electronic Health Record Systems to Study Postacute Sequelae of SARS-CoV-2 Infection: Insights From the NIH RECOVER Initiative. J Med Internet Res 2025; 27:e59217. [PMID: 40053748 PMCID: PMC11923460 DOI: 10.2196/59217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 10/31/2024] [Accepted: 11/20/2024] [Indexed: 03/09/2025] Open
Abstract
The benefits and challenges of electronic health records (EHRs) as data sources for clinical and epidemiologic research have been well described. However, several factors are important to consider when using EHR data to study novel, emerging, and multifaceted conditions such as postacute sequelae of SARS-CoV-2 infection or long COVID. In this article, we present opportunities and challenges of using EHR data to improve our understanding of long COVID, based on lessons learned from the National Institutes of Health (NIH)-funded RECOVER (REsearching COVID to Enhance Recovery) Initiative, and suggest steps to maximize the usefulness of EHR data when performing long COVID research.
Collapse
Affiliation(s)
- Hannah L Mandel
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Shruti N Shah
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - L Charles Bailey
- Applied Clinical Research Center, The Children's Hospital of Philadelphia, Philadelphia, PA, United States
| | - Thomas Carton
- Louisiana Public Health Institute, New Orleans, LA, United States
| | - Yu Chen
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Shari Esquenazi-Karonika
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Melissa Haendel
- Department of Genetics, The University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, United States
| | - Mady Hornig
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY, United States
| | - Rainu Kaushal
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States
| | - Carlos R Oliveira
- Division of Infectious Diseases, Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
- Division of Health Informatics, Department of Biostatistics, Yale University School of Public Health, New Haven, CT, United States
| | | | - Emily Pfaff
- Department of Medicine, The University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, United States
| | - Suchitra Rao
- Department of Pediatrics, University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO, United States
| | - Hanieh Razzaghi
- Applied Clinical Research Center, The Children's Hospital of Philadelphia, Philadelphia, PA, United States
| | - Elle Seibert
- Department of Neuroscience, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA, United States
| | - Gelise L Thomas
- Clinical and Translational Science Collaborative of Northern Ohio, Case Western Reserve University, Cleveland, OH, United States
| | - Mark G Weiner
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States
| | - Lorna E Thorpe
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Jasmin Divers
- Department of Foundations of Medicine, New York University Long Island School of Medicine, Mineola, NY, United States
| |
Collapse
|
3
|
Waxse BJ, Carrillo FAB, Tran TC, Mo H, Ricotta EE, Denny JC. Computable Phenotypes for Respiratory Viral Infections in the All of Us Research Program. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.01.17.25320744. [PMID: 39867363 PMCID: PMC11759596 DOI: 10.1101/2025.01.17.25320744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Electronic health records (EHRs) contain rich temporal data about infectious diseases, but an optimal approach to identify infections remains undefined. Using the All of Us Research Program, we developed computable phenotypes for respiratory viruses by integrating billing codes, prescriptions, and laboratory results within 90-day episodes. Phenotypes computed from 265,222 participants yielded cohorts ranging from 238 (adenovirus) to 28,729 (SARS-CoV-2) cases. Virus-specific billing codes showed varied sensitivity (8-67%) and high positive predictive value (90-97%), except for influenza virus and SARS-CoV-2 where lower PPV (69-70%) improved with increasing billing codes. Identified infections exhibited expected seasonal patterns and virus proportions when compared with CDC data. This integrated approach identified episodic disease more effectively than individual components alone and demonstrated utility in identifying severe infections. The method enables large-scale studies of host genetics, health disparities, and clinical outcomes across episodic diseases.
Collapse
Affiliation(s)
- Bennett J Waxse
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | - Tam C Tran
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
- Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Huan Mo
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emily E Ricotta
- Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
4
|
Ammar S, Borghoff K, El Mikati IK, Mustafa RA, Noureddine L. Using ICD9/10 codes for identifying ADPKD patients, a validation study. J Nephrol 2024; 37:523-525. [PMID: 37907678 DOI: 10.1007/s40620-023-01780-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/03/2023] [Indexed: 11/02/2023]
Affiliation(s)
- Shahed Ammar
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Iowa, Iowa City, IA, USA.
- University of Iowa Carver College of Medicine, Campus Box C44-K, 200 Hawkins Dr., Iowa City, IA, 52242, USA.
| | - Kathleen Borghoff
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Iowa, Iowa City, IA, USA
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Ibrahim K El Mikati
- Outcomes and Implementation Research Unit, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA
| | - Reem A Mustafa
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Lama Noureddine
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
5
|
Wang L, Zipursky AR, Geva A, McMurry AJ, Mandl KD, Miller TA. A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital. JAMIA Open 2023; 6:ooad047. [PMID: 37425487 PMCID: PMC10322650 DOI: 10.1093/jamiaopen/ooad047] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/13/2023] [Accepted: 06/30/2023] [Indexed: 07/11/2023] Open
Abstract
Objective To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). Materials and Methods Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. Results On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. Discussion Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. Conclusion COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.
Collapse
Affiliation(s)
- Lijing Wang
- Department of Data Science, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Amy R Zipursky
- Computational Health Informatics Program and Department of Emergency Medicine, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Alon Geva
- Computational Health Informatics Program and Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Andrew J McMurry
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
6
|
Dhingra LS, Shen M, Mangla A, Khera R. Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record. Am J Cardiol 2023; 203:136-148. [PMID: 37499593 PMCID: PMC10865722 DOI: 10.1016/j.amjcard.2023.06.104] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/24/2023] [Accepted: 06/29/2023] [Indexed: 07/29/2023]
Abstract
The electronic health record (EHR) represents a rich source of patient information, increasingly being leveraged for cardiovascular research. Although its primary use remains the seamless delivery of health care, the various longitudinally aggregated structured and unstructured data elements for each patient within the EHR can define the computational phenotypes of disease and care signatures and their association with outcomes. Although structured data elements, such as demographic characteristics, laboratory measurements, problem lists, and medications, are easily extracted, unstructured data are underused. The latter include free text in clinical narratives, documentation of procedures, and reports of imaging and pathology. Rapid scaling up of data storage and rapid innovation in natural language processing and computer vision can power insights from unstructured data streams. However, despite an array of opportunities for research using the EHR, specific expertise is necessary to adequately address confidentiality, accuracy, completeness, and heterogeneity challenges in EHR-based research. These often require methodological innovation and best practices to design and conduct successful research studies. Our review discusses these challenges and their proposed solutions. In addition, we highlight the ongoing innovations in federated learning in the EHR through a greater focus on common data models and discuss ongoing work that defines such an approach to large-scale, multicenter, federated studies. Such parallel improvements in technology and research methods enable innovative care and optimization of patient outcomes.
Collapse
Affiliation(s)
| | - Miles Shen
- Section of Cardiovascular Medicine, Department of Internal Medicine; Department of Internal Medicine
| | - Anjali Mangla
- Section of Cardiovascular Medicine, Department of Internal Medicine; Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut
| | - Rohan Khera
- Section of Cardiovascular Medicine, Department of Internal Medicine; Center for Outcomes Research and Evaluation (CORE), Yale New Haven Hospital, New Haven, Connecticut; Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut.; Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut.
| |
Collapse
|
7
|
Shappell CN, Klompas M, Chan C, Chen T, Rhee C. Impact of changing case definitions for coronavirus disease 2019 (COVID-19) hospitalization on pandemic metrics. Infect Control Hosp Epidemiol 2023; 44:1458-1466. [PMID: 36912323 PMCID: PMC11253109 DOI: 10.1017/ice.2022.300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
OBJECTIVE To examine the impact of commonly used case definitions for coronavirus disease 2019 (COVID-19) hospitalizations on case counts and outcomes. DESIGN, PATIENTS, AND SETTING Retrospective analysis of all adults hospitalized between March 1, 2020, and March 1, 2022, at 5 Massachusetts acute-care hospitals. INTERVENTIONS We applied 6 commonly used definitions of COVID-19 hospitalization: positive severe acute respiratory coronavirus virus 2 (SARS-CoV-2) polymerase chain reaction (PCR) assay within 14 days of admission, PCR plus dexamethasone administration, PCR plus remdesivir, PCR plus hypoxemia, institutional COVID-19 flag, or COVID-19 International Classification of Disease, Tenth Revision (ICD-10) codes. Outcomes included case counts and in-hospital mortality. Overall, 100 PCR-positive cases were reviewed to determine each definition's accuracy for distinguishing primary or contributing versus incidental COVID-19 hospitalizations. RESULTS Of 306,387 hospital encounters, 15,436 (5.0%) met the PCR-based definition. COVID-19 hospitalization counts varied substantially between definitions: 4,628 (1.5% of all encounters) for PCR plus dexamethasone, 5,757 (1.9%) for PCR plus remdesivir, 11,801 (3.9%) for PCR plus hypoxemia, 15,673 (5.1%) for institutional flags, and 15,868 (5.2%) for ICD-10 codes. Definitions requiring dexamethasone, hypoxemia, or remdesivir selected sicker patients compared to PCR alone (mortality rates 12.2%, 10.7%, and 8.8% vs 8.3%, respectively). Definitions requiring PCR plus remdesivir or dexamethasone did not detect a reduction in in-hospital mortality associated with the SARS-CoV-2 Omicron variant. ICD-10 codes had the highest sensitivity (98.4%) but low specificity (39.5%) for distinguishing primary or contributing versus incidental COVID-19 hospitalizations. PCR plus dexamethasone had the highest specificity (92.1%) but low sensitivity (35.5%). CONCLUSIONS Commonly used definitions for COVID-19 hospitalizations generate variable case counts and outcomes and differentiate poorly between primary or contributing versus incidental COVID-19 hospitalizations. Surveillance definitions that better capture and delineate COVID-19-associated hospitalizations are needed.
Collapse
Affiliation(s)
- Claire N. Shappell
- Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Michael Klompas
- Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA
- Division of Infectious Diseases, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Christina Chan
- Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Tom Chen
- Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Chanu Rhee
- Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA
- Division of Infectious Diseases, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| |
Collapse
|
8
|
Fashina TA, Miller CM, Paintsil E, Niccolai LM, Brandt C, Oliveira CR. Computable Clinical Phenotyping of Postacute Sequelae of COVID-19 in Pediatrics Using Real-World Data. J Pediatric Infect Dis Soc 2023; 12:113-116. [PMID: 36548966 PMCID: PMC9969330 DOI: 10.1093/jpids/piac132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
Many questions remain unanswered concerning the long-term effects of COVID-19 on children. In this report, we describe a computable phenotyping algorithm for identifying children and adolescents with postacute sequelae of COVID-19 (PASC) and pilot this tool to characterize the clinical epidemiology of pediatric PASC in a large healthcare delivery network.
Collapse
Affiliation(s)
| | - Christine M Miller
- Department of Pediatrics, Section of Infectious Diseases and Global Health, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Elijah Paintsil
- Department of Pediatrics, Section of Infectious Diseases and Global Health, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Pharmacology, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Epidemiology of Microbial Diseases, Yale University School of Public, New Haven, Connecticut, USA
| | - Linda M Niccolai
- Department of Epidemiology of Microbial Diseases, Yale University School of Public, New Haven, Connecticut, USA
| | - Cynthia Brandt
- Department of Biostatistics, Section of Health Informatics, Yale University School of Public Health, New Haven, Connecticut, USA, USA
| | - Carlos R Oliveira
- Department of Pediatrics, Section of Infectious Diseases and Global Health, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Biostatistics, Section of Health Informatics, Yale University School of Public Health, New Haven, Connecticut, USA, USA
| |
Collapse
|
9
|
Rao S, Bozio C, Butterfield K, Reynolds S, Reese SE, Ball S, Steffens A, Demarco M, McEvoy C, Thompson M, Rowley E, Porter RM, Fink RV, Irving SA, Naleway A. Accuracy of COVID-19-Like Illness Diagnoses in Electronic Health Record Data: Retrospective Cohort Study. JMIR Form Res 2023; 7:e39231. [PMID: 36383633 PMCID: PMC9848441 DOI: 10.2196/39231] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/13/2022] [Accepted: 09/30/2022] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Electronic health record (EHR) data provide a unique opportunity to study the epidemiology of COVID-19, clinical outcomes of the infection, comparative effectiveness of therapies, and vaccine effectiveness but require a well-defined computable phenotype of COVID-19-like illness (CLI). OBJECTIVE The objective of this study was to evaluate the performance of pathogen-specific and other acute respiratory illness (ARI) International Statistical Classification of Diseases-9 and -10 codes in identifying COVID-19 cases in emergency department (ED) or urgent care (UC) and inpatient settings. METHODS We conducted a retrospective observational cohort study using EHR, claims, and laboratory information system data of ED or UC and inpatient encounters from 4 health systems in the United States. Patients who were aged ≥18 years, had an ED or UC or inpatient encounter for an ARI, and underwent a SARS-CoV-2 polymerase chain reaction test between March 1, 2020, and March 31, 2021, were included. We evaluated various CLI definitions using combinations of International Statistical Classification of Diseases-10 codes as follows: COVID-19-specific codes; CLI definition used in VISION network studies; ARI signs, symptoms, and diagnosis codes only; signs and symptoms of ARI only; and random forest model definitions. We evaluated the sensitivity, specificity, positive predictive value, and negative predictive value of each CLI definition using a positive SARS-CoV-2 polymerase chain reaction test as the reference standard. We evaluated the performance of each CLI definition for distinct hospitalization and ED or UC cohorts. RESULTS Among 90,952 hospitalizations and 137,067 ED or UC visits, 5627 (6.19%) and 9866 (7.20%) were positive for SARS-CoV-2, respectively. COVID-19-specific codes had high sensitivity (91.6%) and specificity (99.6%) in identifying patients with SARS-CoV-2 positivity among hospitalized patients. The VISION CLI definition maintained high sensitivity (95.8%) but lowered specificity (45.5%). By contrast, signs and symptoms of ARI had low sensitivity and positive predictive value (28.9% and 11.8%, respectively) but higher specificity and negative predictive value (85.3% and 94.7%, respectively). ARI diagnoses, signs, and symptoms alone had low predictive performance. All CLI definitions had lower sensitivity for ED or UC encounters. Random forest approaches identified distinct CLI definitions with high performance for hospital encounters and moderate performance for ED or UC encounters. CONCLUSIONS COVID-19-specific codes have high sensitivity and specificity in identifying adults with positive SARS-CoV-2 test results. Separate combinations of COVID-19-specific codes and ARI codes enhance the utility of CLI definitions in studies using EHR data in hospital and ED or UC settings.
Collapse
Affiliation(s)
- Suchitra Rao
- Department of Pediatrics, Hospital Medicine and Infectious Diseases, University of Colorado School of Medicine, Aurora, CO, United States
| | - Catherine Bozio
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Sue Reynolds
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | | | - Andrea Steffens
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | | | - Mark Thompson
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Rachael M Porter
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Stephanie A Irving
- Science Programs Department, Kaiser Permanente Center for Health Research, Portland, OR, United States
| | - Allison Naleway
- Science Programs Department, Kaiser Permanente Center for Health Research, Portland, OR, United States
| |
Collapse
|