1
|
Hill EL, Mehta HB, Sharma S, Mane K, Singh SK, Xie C, Cathey E, Loomba J, Russell S, Spratt H, DeWitt PE, Ammar N, Madlock-Brown C, Brown D, McMurry JA, Chute CG, Haendel MA, Moffitt R, Pfaff ER, Bennett TD. Risk factors associated with post-acute sequelae of SARS-CoV-2: an N3C and NIH RECOVER study. BMC Public Health 2023; 23:2103. [PMID: 37880596 PMCID: PMC10601201 DOI: 10.1186/s12889-023-16916-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 10/05/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). The objective is to identify risk factors associated with PASC/long-COVID diagnosis. METHODS This was a retrospective case-control study including 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system and COVID index date within ± 45 days of the corresponding case's earliest COVID index date. Measurements of risk factors included demographics, comorbidities, treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC. RESULTS Among 8,325 individuals with PASC, the majority were > 50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30 + days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls. CONCLUSIONS This national study identified important risk factors for PASC diagnosis such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.
Collapse
Affiliation(s)
- Elaine L Hill
- Department of Public Health Sciences, University of Rochester Medical Center, 265 Crittenden Boulevard Box 420644, Rochester, NY, 14642, USA.
| | - Hemalkumar B Mehta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD, 21205, USA.
| | - Suchetha Sharma
- School of Data Science, University of Virginia, 3 Elliewood Ave, Charlottesville, VA, 22903, USA
| | - Klint Mane
- Department of Economics, University of Rochester, 1232 Mount Hope Ave, Rochester, NY, 14620, USA
| | - Sharad Kumar Singh
- Goergen Institute for Data Science, University of Rochester, 1209 Wegmans Hall, Rochester, NY, 14627, USA
| | - Catherine Xie
- CMC BOX 275184, University of Rochester, 500 Joseph C. Wilson Blvd, Rochester, NY, 14627-5184, USA
| | - Emily Cathey
- Ivy Foundations Building, Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, 560 Ray C Hunt Drive RM 2153, Charlottesville, VA, 22903, USA
| | - Johanna Loomba
- Ivy Foundations Building, Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, 560 Ray C Hunt Drive RM 2153, Charlottesville, VA, 22903, USA
| | - Seth Russell
- Department of Pediatrics, University of Colorado School of Medicine, 1890 N. Revere Court, Mail Stop 600, Aurora, CO, 80045, USA
| | - Heidi Spratt
- Department of Biostatistics and Data Science, Medical Branch, University of Texas, 301 University Blvd, Galveston, TX, 77555-1148, USA
| | - Peter E DeWitt
- Department of Pediatrics, University of Colorado School of Medicine, 1890 N. Revere Court, Mail Stop 600, Aurora, CO, 80045, USA
| | - Nariman Ammar
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, 50 N Dunlap St., Memphis, TN, 38103, USA
| | - Charisse Madlock-Brown
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, 930 Madison Avenue 6Th Floor, Memphis, TN, 38163, USA
| | - Donald Brown
- Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, 151 Engineer's Way Olsson Hall Rm. 102E, PO Box 400747, Charlottesville, VA, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado School of Medicine, 12800 East 19Th Avenue, Aurora, CO, 80045, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, 2024 E Monument St. , Baltimore, MD, 21287, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado School of Medicine, East 17Th Place Campus Box C290, Aurora, CO, 1300180045, USA
| | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, and Stony Brook Cancer Center, Stony Brook, NY, MART L7 081011794, USA
| | - Emily R Pfaff
- Department of Medicine, North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, 160 N Medical Drive, Chapel Hill, NC, 27599, USA
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, 1890 N. Revere Court, Mail Stop 600, Aurora, CO, 80045, USA
| |
Collapse
|
2
|
Yadaw AS, Sahner DK, Sidky H, Afzali B, Hotaling N, Pfaff ER, Mathé EA. Preexisting Autoimmunity Is Associated With Increased Severity of Coronavirus Disease 2019: A Retrospective Cohort Study Using Data From the National COVID Cohort Collaborative (N3C). Clin Infect Dis 2023; 77:816-826. [PMID: 37207367 PMCID: PMC10506777 DOI: 10.1093/cid/ciad294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/05/2023] [Accepted: 05/12/2023] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND Identifying individuals with a higher risk of developing severe coronavirus disease 2019 (COVID-19) outcomes will inform targeted and more intensive clinical monitoring and management. To date, there is mixed evidence regarding the impact of preexisting autoimmune disease (AID) diagnosis and/or immunosuppressant (IS) exposure on developing severe COVID-19 outcomes. METHODS A retrospective cohort of adults diagnosed with COVID-19 was created in the National COVID Cohort Collaborative enclave. Two outcomes, life-threatening disease and hospitalization, were evaluated by using logistic regression models with and without adjustment for demographics and comorbidities. RESULTS Of the 2 453 799 adults diagnosed with COVID-19, 191 520 (7.81%) had a preexisting AID diagnosis and 278 095 (11.33%) had a preexisting IS exposure. Logistic regression models adjusted for demographics and comorbidities demonstrated that individuals with a preexisting AID (odds ratio [OR], 1.13; 95% confidence interval [CI]: 1.09-1.17; P < .001), IS exposure (OR, 1.27; 95% CI: 1.24-1.30; P < .001), or both (OR, 1.35; 95% CI: 1.29-1.40; P < .001) were more likely to have a life-threatening disease. These results were consistent when hospitalization was evaluated. A sensitivity analysis evaluating specific IS revealed that tumor necrosis factor inhibitors were protective against life-threatening disease (OR, 0.80; 95% CI: .66-.96; P = .017) and hospitalization (OR, 0.80; 95% CI: .73-.89; P < .001). CONCLUSIONS Patients with preexisting AID, IS exposure, or both are more likely to have a life-threatening disease or hospitalization. These patients may thus require tailored monitoring and preventative measures to minimize negative consequences of COVID-19.
Collapse
Affiliation(s)
- Arjun S Yadaw
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - David K Sahner
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Hythem Sidky
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Behdad Afzali
- Immunoregulation Section, Kidney Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Nathan Hotaling
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Ewy A Mathé
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| |
Collapse
|
3
|
Pfaff ER, Girvin AT, Crosskey M, Gangireddy S, Master H, Wei WQ, Kerchberger VE, Weiner M, Harris PA, Basford M, Lunt C, Chute CG, Moffitt RA, Haendel M. De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository. J Am Med Inform Assoc 2023; 30:1305-1312. [PMID: 37218289 PMCID: PMC10280348 DOI: 10.1093/jamia/ocad077] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/28/2023] [Accepted: 04/24/2023] [Indexed: 05/24/2023] Open
Abstract
Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH's All of Us study partnered to reproduce the output of N3C's trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics.
Collapse
Affiliation(s)
- Emily R Pfaff
- Department of Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, USA
| | | | | | - Srushti Gangireddy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Hiral Master
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - V Eric Kerchberger
- Department of Medicine, Division of Allergy, Pulmonary & Critical Care Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Mark Weiner
- Department of Medicine, Weill Cornell Medicine, New York, USA
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Melissa Basford
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Chris Lunt
- National Institutes of Health, Bethesda, Maryland, USA
| | - Christopher G Chute
- Johns Hopkins Schools of Medicine, Public Health, and Nursing. Baltimore, Maryland, USA
| | - Richard A Moffitt
- Departments of Hematology and Medical Oncology and Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Melissa Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Denver, Colorado, USA
| | | |
Collapse
|
4
|
Brannock MD, Chew RF, Preiss AJ, Hadley EC, Redfield S, McMurry JA, Leese PJ, Girvin AT, Crosskey M, Zhou AG, Moffitt RA, Funk MJ, Pfaff ER, Haendel MA, Chute CG. Long COVID risk and pre-COVID vaccination in an EHR-based cohort study from the RECOVER program. Nat Commun 2023; 14:2914. [PMID: 37217471 PMCID: PMC10201472 DOI: 10.1038/s41467-023-38388-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/28/2023] [Indexed: 05/24/2023] Open
Abstract
Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.
Collapse
Affiliation(s)
| | | | | | | | | | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Denver, CO, USA
| | - Peter J Leese
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | - Andrea G Zhou
- iTHRIV, University of Virginia, Charlottesville, VA, USA
| | - Richard A Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
- Departments of Biomedical Informatics and Hematology and Medical Ontology, Emory University, Atlanta, GA, USA
| | | | - Emily R Pfaff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
5
|
Bhatia A, Preiss AJ, Xiao X, Brannock MD, Alexander GC, Chew RF, Fitzgerald M, Hill E, Kelly EP, Mehta HB, Madlock-Brown C, Wilkins KJ, Chute CG, Haendel M, Moffitt R, Pfaff ER. Effect of Nirmatrelvir/Ritonavir (Paxlovid) on Hospitalization among Adults with COVID-19: an EHR-based Target Trial Emulation from N3C. medRxiv 2023:2023.05.03.23289084. [PMID: 37205340 PMCID: PMC10187454 DOI: 10.1101/2023.05.03.23289084] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This study leverages electronic health record data in the National COVID Cohort Collaborative's (N3C) repository to investigate disparities in Paxlovid treatment and to emulate a target trial assessing its effectiveness in reducing COVID-19 hospitalization rates. From an eligible population of 632,822 COVID-19 patients seen at 33 clinical sites across the United States between December 23, 2021 and December 31, 2022, patients were matched across observed treatment groups, yielding an analytical sample of 410,642 patients. We estimate a 65% reduced odds of hospitalization among Paxlovid-treated patients within a 28-day follow-up period, and this effect did not vary by patient vaccination status. Notably, we observe disparities in Paxlovid treatment, with lower rates among Black and Hispanic or Latino patients, and within socially vulnerable communities. Ours is the largest study of Paxlovid's real-world effectiveness to date, and our primary findings are consistent with previous randomized control trials and real-world studies.
Collapse
Affiliation(s)
- Abhishek Bhatia
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Xuya Xiao
- School of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - G Caleb Alexander
- School of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | | | - Elaine Hill
- University of Rochester, Department of Public Health Sciences and Department of Economics, Rochester, NY, USA
| | | | - Hemalkumar B Mehta
- School of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Kenneth J Wilkins
- National Institute of Diabetes & Digestive & Kidney Diseases, Office of the Director, National Institutes of Health, Bethesda, MD, USA
- F. Edward Hébert School of Medicine, Department of Preventive Medicine & Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Christopher G Chute
- School of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | - Melissa Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Emily R Pfaff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
6
|
Pfaff ER, Madlock-Brown C, Baratta JM, Bhatia A, Davis H, Girvin A, Hill E, Kelly E, Kostka K, Loomba J, McMurry JA, Wong R, Bennett TD, Moffitt R, Chute CG, Haendel M. Coding long COVID: characterizing a new disease through an ICD-10 lens. BMC Med 2023; 21:58. [PMID: 36793086 PMCID: PMC9931566 DOI: 10.1186/s12916-023-02737-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 01/13/2023] [Indexed: 02/17/2023] Open
Abstract
BACKGROUND Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." METHODS We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. RESULTS We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. CONCLUSIONS This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.
Collapse
Affiliation(s)
- Emily R Pfaff
- University of North Carolina at Chapel Hill, Chapel Hill, USA.
| | | | - John M Baratta
- University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Abhishek Bhatia
- University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Hannah Davis
- Patient-Led Research Collaborative, New York, USA
| | | | | | - Elizabeth Kelly
- University of North Carolina at Chapel Hill, Chapel Hill, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Yadaw AS, Afzali B, Hotaling N, Sidky H, Pfaff ER, Sahner DK, Mathé EA. Pre-existing autoimmunity is associated with increased severity of COVID-19: A retrospective cohort study using data from the National COVID Cohort Collaborative (N3C). medRxiv 2023:2023.02.02.23285353. [PMID: 36778264 PMCID: PMC9915827 DOI: 10.1101/2023.02.02.23285353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Importance Identifying individuals with a higher risk of developing severe COVID-19 outcomes will inform targeted or more intensive clinical monitoring and management. Objective To examine, using data from the National COVID Cohort Collaborative (N3C), whether patients with pre-existing autoimmune disease (AID) diagnosis and/or immunosuppressant (IS) exposure are at a higher risk of developing severe COVID-19 outcomes. Design setting and participants A retrospective cohort of 2,453,799 individuals diagnosed with COVID-19 between January 1 st , 2020, and June 30 th , 2022, was created from the N3C data enclave, which comprises data of 15,231,849 patients from 75 USA data partners. Patients were stratified as those with/without a pre-existing diagnosis of AID and/or those with/without exposure to IS prior to COVID-19. Main outcomes and measures Two outcomes of COVID-19 severity, derived from the World Health Organization severity score, were defined, namely life-threatening disease and hospitalization. Odds ratios (ORs) with 95% confidence intervals (CIs) were calculated using logistic regression models with and without adjustment for demographics (age, BMI, gender, race, ethnicity, smoking status), and comorbidities (cardiovascular disease, dementia, pulmonary disease, liver disease, type 2 diabetes mellitus, kidney disease, cancer, and HIV infection). Results In total, 2,453,799 (16.11% of the N3C cohort) adults (age> 18 years) were diagnosed with COVID-19, of which 191,520 (7.81%) had a prior AID diagnosis, and 278,095 (11.33%) had a prior IS exposure. Logistic regression models adjusted for demographic factors and comorbidities demonstrated that individuals with a prior AID (OR = 1.13, 95% CI 1.09 - 1.17; p =2.43E-13), prior exposure to IS (OR= 1.27, 95% CI 1.24 - 1.30; p =3.66E-74), or both (OR= 1.35, 95% CI 1.29 - 1.40; p =7.50E-49) were more likely to have a life-threatening COVID-19 disease. These results were confirmed after adjusting for exposure to antivirals and vaccination in a cohort subset with COVID-19 diagnosis dates after December 2021 (AID OR = 1.18, 95% CI 1.02 - 1.36; p =2.46E-02; IS OR= 1.60, 95% CI 1.41 - 1.80; p =5.11E-14; AID+IS OR= 1.93, 95% CI 1.62 - 2.30; p =1.68E-13). These results were consistent when evaluating hospitalization as the outcome and also when stratifying by race and sex. Finally, a sensitivity analysis evaluating specific IS revealed that TNF inhibitors were protective against life-threatening disease (OR = 0.80, 95% CI 0.66-0.96; p =1.66E-2) and hospitalization (OR = 0.80, 95% CI 0.73 - 0.89; p =1.06E-05). Conclusions and Relevance Patients with pre-existing AID, exposure to IS, or both are more likely to have a life-threatening disease or hospitalization. These patients may thus require tailored monitoring and preventative measures to minimize negative consequences of COVID-19.
Collapse
Affiliation(s)
- Arjun S. Yadaw
- National Center for Advancing Translational Sciences (NCATS), NIH, Rockville, MD, USA
| | - Behdad Afzali
- Immunoregulation Section, Kidney Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), NIH, Bethesda, MD, USA
| | - Nathan Hotaling
- National Center for Advancing Translational Sciences (NCATS), NIH, Rockville, MD, USA
| | - Hythem Sidky
- National Center for Advancing Translational Sciences (NCATS), NIH, Rockville, MD, USA
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - David K. Sahner
- National Center for Advancing Translational Sciences (NCATS), NIH, Rockville, MD, USA
| | - Ewy A. Mathé
- National Center for Advancing Translational Sciences (NCATS), NIH, Rockville, MD, USA
| |
Collapse
|
8
|
Hadley E, Yoo YJ, Patel S, Zhou A, Laraway B, Wong R, Preiss A, Chew R, Davis H, Chute CG, Pfaff ER, Loomba J, Haendel M, Hill E, Moffitt R. SARS-CoV-2 Reinfection is Preceded by Unique Biomarkers and Related to Initial Infection Timing and Severity: an N3C RECOVER EHR-Based Cohort Study. medRxiv 2023:2023.01.03.22284042. [PMID: 36656776 PMCID: PMC9844020 DOI: 10.1101/2023.01.03.22284042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Although the COVID-19 pandemic has persisted for over 2 years, reinfections with SARS-CoV-2 are not well understood. We use the electronic health record (EHR)-based study cohort from the National COVID Cohort Collaborative (N3C) as part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative to characterize reinfection, understand development of Long COVID after reinfection, and compare severity of reinfection with initial infection. We validate previous findings of reinfection incidence (5.9%), the occurrence of most reinfections during the Omicron epoch, and evidence of multiple reinfections. We present novel findings that Long COVID diagnoses occur closer to the index date for infection or reinfection in the Omicron BA epoch. We report lower albumin levels leading up to reinfection and a statistically significant association of severity between first infection and reinfection (chi-squared value: 9446.2, p-value: 0) with a medium effect size (Cramer's V: 0.18, DoF = 4).
Collapse
Affiliation(s)
| | | | | | - Andrea Zhou
- University of Virginia, Charlottesville, VA, US
| | | | | | | | - Rob Chew
- RTI International, Durham, NC, US
| | - Hannah Davis
- RECOVER Patient Led Research Collaborative (PLRC), US
| | | | | | | | - Melissa Haendel
- University of Colorado Anschutz Medical Campus, Denver, CO, US
| | - Elaine Hill
- University of Rochester Medical Center, Rochester, NY, US
| | | | | |
Collapse
|
9
|
Brannock MD, Chew RF, Preiss AJ, Hadley EC, McMurry JA, Leese PJ, Girvin AT, Crosskey M, Zhou AG, Moffitt RA, Funk MJ, Pfaff ER, Haendel MA, Chute CG. Long COVID Risk and Pre-COVID Vaccination: An EHR-Based Cohort Study from the RECOVER Program. medRxiv 2022:2022.10.06.22280795. [PMID: 36238713 PMCID: PMC9558440 DOI: 10.1101/2022.10.06.22280795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Importance Characterizing the effect of vaccination on long COVID allows for better healthcare recommendations. Objective To determine if, and to what degree, vaccination prior to COVID-19 is associated with eventual long COVID onset, among those a documented COVID-19 infection. Design Settings and Participants Retrospective cohort study of adults with evidence of COVID-19 between August 1, 2021 and January 31, 2022 based on electronic health records from eleven healthcare institutions taking part in the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, a project of the National Covid Cohort Collaborative (N3C). Exposures Pre-COVID-19 receipt of a complete vaccine series versus no pre-COVID-19 vaccination. Main Outcomes and Measures Two approaches to the identification of long COVID were used. In the clinical diagnosis cohort (n=47,752), ICD-10 diagnosis codes or evidence of a healthcare encounter at a long COVID clinic were used. In the model-based cohort (n=199,498), a computable phenotype was used. The association between pre-COVID vaccination and long COVID was estimated using IPTW-adjusted logistic regression and Cox proportional hazards. Results In both cohorts, when adjusting for demographics and medical history, pre-COVID vaccination was associated with a reduced risk of long COVID (clinic-based cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; model-based cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75). Conclusions and Relevance Long COVID has become a central concern for public health experts. Prior studies have considered the effect of vaccination on the prevalence of future long COVID symptoms, but ours is the first to thoroughly characterize the association between vaccination and clinically diagnosed or computationally derived long COVID. Our results bolster the growing consensus that vaccines retain protective effects against long COVID even in breakthrough infections. Key Points Question: Does vaccination prior to COVID-19 onset change the risk of long COVID diagnosis?Findings: Four observational analyses of EHRs showed a statistically significant reduction in long COVID risk associated with pre-COVID vaccination (first cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; second cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75).Meaning: Vaccination prior to COVID onset has a protective association with long COVID even in the case of breakthrough infections.
Collapse
Affiliation(s)
| | | | | | | | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Denver, CO, US
| | - Peter J Leese
- University of North Carolina at Chapel Hill, Chapel Hill, NC, US
| | | | | | | | | | | | - Emily R Pfaff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, US
| | | | | |
Collapse
|
10
|
Pfaff ER, Madlock-Brown C, Baratta JM, Bhatia A, Davis H, Girvin A, Hill E, Kelly L, Kostka K, Loomba J, McMurry JA, Wong R, Bennett TD, Moffitt R, Chute CG, Haendel M. Coding Long COVID: Characterizing a new disease through an ICD-10 lens. medRxiv 2022:2022.04.18.22273968. [PMID: 36093345 PMCID: PMC9460974 DOI: 10.1101/2022.04.18.22273968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes Long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of Long COVID are still in flux, and the deployment of an ICD-10-CM code for Long COVID in the US took nearly two years after patients had begun to describe their condition. Here we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." Methods We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code ( n = 21,072), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. Results We established the diagnoses most commonly co-occurring with U09.9, and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty, high education, and high access to medical care. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. Conclusions This work offers insight into potential subtypes and current practice patterns around Long COVID, and speaks to the existence of disparities in the diagnosis of patients with Long COVID. This latter finding in particular requires further research and urgent remediation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Liz Kelly
- University of North Carolina at Chapel Hill
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Hill E, Mehta H, Sharma S, Mane K, Xie C, Cathey E, Loomba J, Russell S, Spratt H, DeWitt PE, Ammar N, Madlock-Brown C, Brown D, McMurry JA, Chute CG, Haendel MA, Moffitt R, Pfaff ER, Bennett TD. Risk Factors Associated with Post-Acute Sequelae of SARS-CoV-2 in an EHR Cohort: A National COVID Cohort Collaborative (N3C) Analysis as part of the NIH RECOVER program. medRxiv 2022:2022.08.15.22278603. [PMID: 36032983 PMCID: PMC9413724 DOI: 10.1101/2022.08.15.22278603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). Objective To identify risk factors associated with PASC/long-COVID. Design Retrospective case-control study. Setting 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). Patients 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system. Measurements Risk factors included demographics, comorbidities, and treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC. Results Among 8,325 individuals with PASC, the majority were >50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30+ days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls. Conclusions This national study identified important risk factors for PASC such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.
Collapse
Affiliation(s)
- Elaine Hill
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
| | - Hemal Mehta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Suchetha Sharma
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Klint Mane
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
| | - Catherine Xie
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
| | - Emily Cathey
- integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
| | - Johanna Loomba
- integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
| | - Seth Russell
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Heidi Spratt
- Department of Biostatistics and Data Science, University of Texas Medical Branch, Galveston, TX, USA
| | - Peter E DeWitt
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Nariman Ammar
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Charisse Madlock-Brown
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Donald Brown
- integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA
| | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, and Stony Brook Cancer Center, Stony Brook, NY, USA
| | - Emily R Pfaff
- Department of Medicine, North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tellen D Bennett
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
12
|
Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, Dekermanjian JP, Jolley SE, Kahn MG, Kostka K, McMurry JA, Moffitt R, Walden A, Chute CG, Haendel MA. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health 2022; 4:e532-e541. [PMID: 35589549 PMCID: PMC9110014 DOI: 10.1016/s2589-7500(22)00048-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/19/2022] [Accepted: 03/08/2022] [Indexed: 11/05/2022]
Abstract
BACKGROUND Post-acute sequelae of SARS-CoV-2 infection, known as long COVID, have severely affected recovery from the COVID-19 pandemic for patients and society alike. Long COVID is characterised by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous definition. Studies of electronic health records are a crucial element of the US National Institutes of Health's RECOVER Initiative, which is addressing the urgent need to understand long COVID, identify treatments, and accurately identify who has it-the latter is the aim of this study. METHODS Using the National COVID Cohort Collaborative's (N3C) electronic health record repository, we developed XGBoost machine learning models to identify potential patients with long COVID. We defined our base population (n=1 793 604) as any non-deceased adult patient (age ≥18 years) with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from an inpatient or emergency visit, or a positive SARS-CoV-2 PCR or antigen test, and for whom at least 90 days have passed since COVID-19 index date. We examined demographics, health-care utilisation, diagnoses, and medications for 97 995 adults with COVID-19. We used data on these features and 597 patients from a long COVID clinic to train three machine learning models to identify potential long COVID among all patients with COVID-19, patients hospitalised with COVID-19, and patients who had COVID-19 but were not hospitalised. Feature importance was determined via Shapley values. We further validated the models on data from a fourth site. FINDINGS Our models identified, with high accuracy, patients who potentially have long COVID, achieving areas under the receiver operator characteristic curve of 0·92 (all patients), 0·90 (hospitalised), and 0·85 (non-hospitalised). Important features, as defined by Shapley values, include rate of health-care utilisation, patient age, dyspnoea, and other diagnosis and medication information available within the electronic health record. INTERPRETATION Patients identified by our models as potentially having long COVID can be interpreted as patients warranting care at a specialty clinic for long COVID, which is an essential proxy for long COVID diagnosis as its definition continues to evolve. We also achieve the urgent goal of identifying potential long COVID in patients for clinical trials. As more data sources are identified, our models can be retrained and tuned based on the needs of individual studies. FUNDING US National Institutes of Health and National Center for Advancing Translational Sciences through the RECOVER Initiative.
Collapse
Affiliation(s)
- Emily R Pfaff
- Department of Medicine, UNC Chapel Hill School of Medicine, Chapel Hill, NC, USA.
| | | | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Section of Critical Care Medicine, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Abhishek Bhatia
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ian M Brooks
- Colorado Center for Personalised Medicine, Division of Biomedical Informatics & Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Rachel R Deer
- Department of Nutrition, Metabolism, and Rehabilitation Sciences, University of Texas Medical Branch, Galveston, TX, USA
| | - Jonathan P Dekermanjian
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Sarah Elizabeth Jolley
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Michael G Kahn
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Kristin Kostka
- The OHDSI Center at the Roux Institute, Northeastern University, Portland, ME, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook Cancer Center, Stony Brook University, Stony Brook, NY, USA
| | - Anita Walden
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher G Chute
- Section of Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
13
|
Bradwell KR, Wooldridge JT, Amor B, Bennett TD, Anand A, Bremer C, Yoo YJ, Qian Z, Johnson SG, Pfaff ER, Girvin AT, Manna A, Niehaus EA, Hong SS, Zhang XT, Zhu RL, Bissell M, Qureshi N, Saltz J, Haendel MA, Chute CG, Lehmann HP, Moffitt RA. Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset. J Am Med Inform Assoc 2022; 29:1172-1182. [PMID: 35435957 PMCID: PMC9196692 DOI: 10.1093/jamia/ocac054] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 03/25/2022] [Accepted: 04/08/2022] [Indexed: 11/24/2022] Open
Abstract
Objective The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. Materials and Methods The National COVID Cohort Collaborative (N3C) table of laboratory measurement data—over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. Results Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors’ records lacked units). Discussion The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. Conclusion The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.
Collapse
Affiliation(s)
| | - Jacob T Wooldridge
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | | | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - Adit Anand
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Carolyn Bremer
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Yun Jae Yoo
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Zhenglong Qian
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Steven G Johnson
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Emily R Pfaff
- Department of Medicine, North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Amin Manna
- Palantir Technologies, Denver, Colorado, USA
| | | | - Stephanie S Hong
- School of Medicine, Section of Biomedical Informatics and Data Science, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - Richard L Zhu
- Department of Medicine, Johns Hopkins, Baltimore, Maryland, USA
| | | | | | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | | | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | | | - Richard A Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| |
Collapse
|
14
|
Pfaff ER, Girvin AT, Gabriel DL, Kostka K, Morris M, Palchuk MB, Lehmann HP, Amor B, Bissell M, Bradwell KR, Gold S, Hong SS, Loomba J, Manna A, McMurry JA, Niehaus E, Qureshi N, Walden A, Zhang XT, Zhu RL, Moffitt RA, Haendel MA, Chute CG, Adams WG, Al-Shukri S, Anzalone A, Baghal A, Bennett TD, Bernstam EV, Bernstam EV, Bissell MM, Bush B, Campion TR, Castro V, Chang J, Chaudhari DD, Chen W, Chu S, Cimino JJ, Crandall KA, Crooks M, Davies SJD, DiPalazzo J, Dorr D, Eckrich D, Eltinge SE, Fort DG, Golovko G, Gupta S, Haendel MA, Hajagos JG, Hanauer DA, Harnett BM, Horswell R, Huang N, Johnson SG, Kahn M, Khanipov K, Kieler C, Luzuriaga KRD, Maidlow S, Martinez A, Mathew J, McClay JC, McMahan G, Melancon B, Meystre S, Miele L, Morizono H, Pablo R, Patel L, Phuong J, Popham DJ, Pulgarin C, Santos C, Sarkar IN, Sazo N, Setoguchi S, Soby S, Surampalli S, Suver C, Vangala UMR, Visweswaran S, von Oehsen J, Walters KM, Wiley L, Williams DA, Zai A. Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative. J Am Med Inform Assoc 2022; 29:609-618. [PMID: 34590684 PMCID: PMC8500110 DOI: 10.1093/jamia/ocab217] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/19/2021] [Accepted: 09/23/2021] [Indexed: 02/01/2023] Open
Abstract
OBJECTIVE In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. MATERIALS AND METHODS We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. RESULTS Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. DISCUSSION We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. CONCLUSION By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.
Collapse
Affiliation(s)
- Emily R Pfaff
- Department of Medicine, UNC Chapel Hill School of Medicine, Chapel Hill, North Carolina, USA
| | | | - Davera L Gabriel
- Section of Biomedical Informatics and Data Science, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Kristin Kostka
- The OHDSI Center at the Roux Institute, Northeastern University, Portland, Maine, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | - Harold P Lehmann
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | | | | | | | - Sigfried Gold
- Section of Biomedical Informatics and Data Science, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Stephanie S Hong
- Section of Biomedical Informatics and Data Science, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - Amin Manna
- Palantir Technologies, Denver, Colorado, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | | | | | - Anita Walden
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | | | - Richard L Zhu
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Richard A Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Gehtland LM, Paquin RS, Andrews SM, Lee AM, Gwaltney A, Duparc M, Pfaff ER, Bailey DB. Using a Patient Portal to Increase Enrollment in a Newborn Screening Research Study: Observational Study. JMIR Pediatr Parent 2022; 5:e30941. [PMID: 35142618 PMCID: PMC8874929 DOI: 10.2196/30941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/12/2021] [Accepted: 12/11/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Many research studies fail to enroll enough research participants. Patient-facing electronic health record applications, known as patient portals, may be used to send research invitations to eligible patients. OBJECTIVE The first aim was to determine if receipt of a patient portal research recruitment invitation was associated with enrollment in a large ongoing study of newborns (Early Check). The second aim was to determine if there were differences in opening the patient portal research recruitment invitation and study enrollment by race and ethnicity, age, or rural/urban home address. METHODS We used a computable phenotype and queried the health care system's clinical data warehouse to identify women whose newborns would likely be eligible. Research recruitment invitations were sent through the women's patient portals. We conducted logistic regressions to test whether women enrolled their newborns after receipt of a patient portal invitation and whether there were differences by race and ethnicity, age, and rural/urban home address. RESULTS Research recruitment invitations were sent to 4510 women not yet enrolled through their patient portals between November 22, 2019, through March 5, 2020. Among women who received a patient portal invitation, 3.6% (161/4510) enrolled their newborns within 27 days. The odds of enrolling among women who opened the invitation was nearly 9 times the odds of enrolling among women who did not open their invitation (SE 3.24, OR 8.86, 95% CI 4.33-18.13; P<.001). On average, it took 3.92 days for women to enroll their newborn in the study, with 64% (97/161) enrolling their newborn within 1 day of opening the invitation. There were disparities by race and urbanicity in enrollment in the study after receipt of a patient portal research invitation but not by age. Black women were less likely to enroll their newborns than White women (SE 0.09, OR 0.29, 95% CI 0.16-0.55; P<.001), and women in urban zip codes were more likely to enroll their newborns than women in rural zip codes (SE 0.97, OR 3.03, 95% CI 1.62-5.67; P=.001). Black women (SE 0.05, OR 0.67, 95% CI 0.57-0.78; P<.001) and Hispanic women (SE 0.07, OR 0.73, 95% CI 0.60-0.89; P=.002) were less likely to open the research invitation compared to White women. CONCLUSIONS Patient portals are an effective way to recruit participants for research studies, but there are substantial racial and ethnic disparities and disparities by urban/rural status in the use of patient portals, the opening of a patient portal invitation, and enrollment in the study. TRIAL REGISTRATION ClinicalTrials.gov NCT03655223; https://clinicaltrials.gov/ct2/show/NCT03655223.
Collapse
Affiliation(s)
- Lisa M Gehtland
- RTI International, Research Triangle Park, NC, United States
| | - Ryan S Paquin
- RTI International, Research Triangle Park, NC, United States
| | - Sara M Andrews
- RTI International, Research Triangle Park, NC, United States
| | - Adam M Lee
- Department of Medicine, University of North Carolina Chapel Hill, Chapel Hill, NC, United States
| | - Angela Gwaltney
- RTI International, Research Triangle Park, NC, United States
| | - Martin Duparc
- RTI International, Research Triangle Park, NC, United States
| | - Emily R Pfaff
- Department of Medicine, University of North Carolina Chapel Hill, Chapel Hill, NC, United States
| | - Donald B Bailey
- RTI International, Research Triangle Park, NC, United States
| |
Collapse
|
16
|
Martin B, DeWitt PE, Russell S, Anand A, Bradwell KR, Bremer C, Gabriel D, Girvin AT, Hajagos JG, McMurry JA, Neumann AJ, Pfaff ER, Walden A, Wooldridge JT, Yoo YJ, Saltz J, Gersing KR, Chute CG, Haendel MA, Moffitt R, Bennett TD. Characteristics, Outcomes, and Severity Risk Factors Associated With SARS-CoV-2 Infection Among Children in the US National COVID Cohort Collaborative. JAMA Netw Open 2022; 5:e2143151. [PMID: 35133437 PMCID: PMC8826172 DOI: 10.1001/jamanetworkopen.2021.43151] [Citation(s) in RCA: 85] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/15/2021] [Indexed: 01/20/2023] Open
Abstract
Importance Understanding of SARS-CoV-2 infection in US children has been limited by the lack of large, multicenter studies with granular data. Objective To examine the characteristics, changes over time, outcomes, and severity risk factors of children with SARS-CoV-2 within the National COVID Cohort Collaborative (N3C). Design, Setting, and Participants A prospective cohort study of encounters with end dates before September 24, 2021, was conducted at 56 N3C facilities throughout the US. Participants included children younger than 19 years at initial SARS-CoV-2 testing. Main Outcomes and Measures Case incidence and severity over time, demographic and comorbidity severity risk factors, vital sign and laboratory trajectories, clinical outcomes, and acute COVID-19 vs multisystem inflammatory syndrome in children (MIS-C), and Delta vs pre-Delta variant differences for children with SARS-CoV-2. Results A total of 1 068 410 children were tested for SARS-CoV-2 and 167 262 test results (15.6%) were positive (82 882 [49.6%] girls; median age, 11.9 [IQR, 6.0-16.1] years). Among the 10 245 children (6.1%) who were hospitalized, 1423 (13.9%) met the criteria for severe disease: mechanical ventilation (796 [7.8%]), vasopressor-inotropic support (868 [8.5%]), extracorporeal membrane oxygenation (42 [0.4%]), or death (131 [1.3%]). Male sex (odds ratio [OR], 1.37; 95% CI, 1.21-1.56), Black/African American race (OR, 1.25; 95% CI, 1.06-1.47), obesity (OR, 1.19; 95% CI, 1.01-1.41), and several pediatric complex chronic condition (PCCC) subcategories were associated with higher severity disease. Vital signs and many laboratory test values from the day of admission were predictive of peak disease severity. Variables associated with increased odds for MIS-C vs acute COVID-19 included male sex (OR, 1.59; 95% CI, 1.33-1.90), Black/African American race (OR, 1.44; 95% CI, 1.17-1.77), younger than 12 years (OR, 1.81; 95% CI, 1.51-2.18), obesity (OR, 1.76; 95% CI, 1.40-2.22), and not having a pediatric complex chronic condition (OR, 0.72; 95% CI, 0.65-0.80). The children with MIS-C had a more inflammatory laboratory profile and severe clinical phenotype, with higher rates of invasive ventilation (117 of 707 [16.5%] vs 514 of 8241 [6.2%]; P < .001) and need for vasoactive-inotropic support (191 of 707 [27.0%] vs 426 of 8241 [5.2%]; P < .001) compared with those who had acute COVID-19. Comparing children during the Delta vs pre-Delta eras, there was no significant change in hospitalization rate (1738 [6.0%] vs 8507 [6.2%]; P = .18) and lower odds for severe disease (179 [10.3%] vs 1242 [14.6%]) (decreased by a factor of 0.67; 95% CI, 0.57-0.79; P < .001). Conclusions and Relevance In this cohort study of US children with SARS-CoV-2, there were observed differences in demographic characteristics, preexisting comorbidities, and initial vital sign and laboratory values between severity subgroups. Taken together, these results suggest that early identification of children likely to progress to severe disease could be achieved using readily available data elements from the day of admission. Further work is needed to translate this knowledge into improved outcomes.
Collapse
Affiliation(s)
- Blake Martin
- Section of Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Peter E. DeWitt
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Seth Russell
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Adit Anand
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | | | - Carolyn Bremer
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Davera Gabriel
- Johns Hopkins University School of Medicine, Baltimore, Maryland
| | | | - Janos G. Hajagos
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Julie A. McMurry
- Translational and Integrative Sciences Center, University of Colorado, Aurora
- Center for Health AI, University of Colorado, Aurora
| | - Andrew J. Neumann
- Translational and Integrative Sciences Center, University of Colorado, Aurora
- Center for Health AI, University of Colorado, Aurora
| | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute), University of North Carolina at Chapel Hill, Chapel Hill
| | - Anita Walden
- Center for Health AI, University of Colorado, Aurora
| | - Jacob T. Wooldridge
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Yun Jae Yoo
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Ken R. Gersing
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland
| | - Christopher G. Chute
- Johns Hopkins University School of Medicine, Baltimore, Maryland
- Schools of Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland
| | | | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Tellen D. Bennett
- Section of Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| |
Collapse
|
17
|
Walters KM, Jojic A, Pfaff ER, Rape M, Spencer DC, Shaheen NJ, Lamm B, Carey TS. Supporting research, protecting data: one institution's approach to clinical data warehouse governance. J Am Med Inform Assoc 2021; 29:707-712. [PMID: 34871428 PMCID: PMC8922173 DOI: 10.1093/jamia/ocab259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 09/21/2021] [Accepted: 11/11/2021] [Indexed: 12/17/2022] Open
Abstract
Institutions must decide how to manage the use of clinical data to support research while ensuring appropriate protections are in place. Questions about data use and sharing often go beyond what the Health Insurance Portability and Accountability Act of 1996 (HIPAA) considers. In this article, we describe our institution’s governance model and approach. Common questions we consider include (1) Is a request limited to the minimum data necessary to carry the research forward? (2) What plans are there for sharing data externally?, and (3) What impact will the proposed use of data have on patients and the institution? In 2020, 302 of the 319 requests reviewed were approved. The majority of requests were approved in less than 2 weeks, with few or no stipulations. For the remaining requests, the governance committee works with researchers to find solutions to meet their needs while also addressing our collective goal of protecting patients.
Collapse
Affiliation(s)
- Kellie M Walters
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Anna Jojic
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Emily R Pfaff
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Marie Rape
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Donald C Spencer
- Information Services Division, UNC Health, Morrisville, North Carolina, USA
| | - Nicholas J Shaheen
- Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brent Lamm
- Information Services Division, UNC Health, Morrisville, North Carolina, USA
| | - Timothy S Carey
- Division of General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
18
|
Martin B, DeWitt PE, Russell S, Anand A, Bradwell KR, Bremer C, Gabriel D, Girvin AT, Hajagos JG, McMurry JA, Neumann AJ, Pfaff ER, Walden A, Wooldridge JT, Yoo YJ, Saltz J, Gersing KR, Chute CG, Haendel MA, Moffitt R, Bennett TD. Children with SARS-CoV-2 in the National COVID Cohort Collaborative (N3C). medRxiv 2021:2021.07.19.21260767. [PMID: 34341796 PMCID: PMC8328064 DOI: 10.1101/2021.07.19.21260767] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
IMPORTANCE SARS-CoV-2. OBJECTIVE To determine the characteristics, changes over time, outcomes, and severity risk factors of SARS-CoV-2 affected children within the National COVID Cohort Collaborative (N3C). DESIGN Prospective cohort study of patient encounters with end dates before May 27th, 2021. SETTING 45 N3C institutions. PARTICIPANTS Children <19-years-old at initial SARS-CoV-2 testing. MAIN OUTCOMES AND MEASURES Case incidence and severity over time, demographic and comorbidity severity risk factors, vital sign and laboratory trajectories, clinical outcomes, and acute COVID-19 vs MIS-C contrasts for children infected with SARS-CoV-2. RESULTS 728,047 children in the N3C were tested for SARS-CoV-2; of these, 91,865 (12.6%) were positive. Among the 5,213 (6%) hospitalized children, 685 (13%) met criteria for severe disease: mechanical ventilation (7%), vasopressor/inotropic support (7%), ECMO (0.6%), or death/discharge to hospice (1.1%). Male gender, African American race, older age, and several pediatric complex chronic condition (PCCC) subcategories were associated with higher clinical severity (p ≤ 0.05). Vital signs (all p≤0.002) and many laboratory tests from the first day of hospitalization were predictive of peak disease severity. Children with severe (vs moderate) disease were more likely to receive antimicrobials (71% vs 32%, p<0.001) and immunomodulatory medications (53% vs 16%, p<0.001). Compared to those with acute COVID-19, children with MIS-C were more likely to be male, Black/African American, 1-to-12-years-old, and less likely to have asthma, diabetes, or a PCCC (p < 0.04). MIS-C cases demonstrated a more inflammatory laboratory profile and more severe clinical phenotype with higher rates of invasive ventilation (12% vs 6%) and need for vasoactive-inotropic support (31% vs 6%) compared to acute COVID-19 cases, respectively (p<0.03). CONCLUSIONS In the largest U.S. SARS-CoV-2-positive pediatric cohort to date, we observed differences in demographics, pre-existing comorbidities, and initial vital sign and laboratory test values between severity subgroups. Taken together, these results suggest that early identification of children likely to progress to severe disease could be achieved using readily available data elements from the day of admission. Further work is needed to translate this knowledge into improved outcomes.
Collapse
Affiliation(s)
- Blake Martin
- Section of Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA
| | - Peter E. DeWitt
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA
| | - Seth Russell
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA
| | - Adit Anand
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | | | - Carolyn Bremer
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Davera Gabriel
- Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - Janos G. Hajagos
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Julie A. McMurry
- Translational and Integrative Sciences Center, University of Colorado, Aurora, CO, USA,Center for Health AI, University of Colorado, Aurora, CO, USA
| | - Andrew J. Neumann
- Translational and Integrative Sciences Center, University of Colorado, Aurora, CO, USA,Center for Health AI, University of Colorado, Aurora, CO, USA
| | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Anita Walden
- Center for Health AI, University of Colorado, Aurora, CO, USA
| | - Jacob T. Wooldridge
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Yun Jae Yoo
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Ken R. Gersing
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
| | - Christopher G. Chute
- Johns Hopkins University School of Medicine, Baltimore, MD, USA,Schools of Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Tellen D. Bennett
- Section of Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA,Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA
| |
Collapse
|
19
|
Bennett TD, Moffitt RA, Hajagos JG, Amor B, Anand A, Bissell MM, Bradwell KR, Bremer C, Byrd JB, Denham A, DeWitt PE, Gabriel D, Garibaldi BT, Girvin AT, Guinney J, Hill EL, Hong SS, Jimenez H, Kavuluru R, Kostka K, Lehmann HP, Levitt E, Mallipattu SK, Manna A, McMurry JA, Morris M, Muschelli J, Neumann AJ, Palchuk MB, Pfaff ER, Qian Z, Qureshi N, Russell S, Spratt H, Walden A, Williams AE, Wooldridge JT, Yoo YJ, Zhang XT, Zhu RL, Austin CP, Saltz JH, Gersing KR, Haendel MA, Chute CG. Clinical Characterization and Prediction of Clinical Severity of SARS-CoV-2 Infection Among US Adults Using Data From the US National COVID Cohort Collaborative. JAMA Netw Open 2021; 4:e2116901. [PMID: 34255046 PMCID: PMC8278272 DOI: 10.1001/jamanetworkopen.2021.16901] [Citation(s) in RCA: 146] [Impact Index Per Article: 48.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/03/2021] [Indexed: 12/15/2022] Open
Abstract
Importance The National COVID Cohort Collaborative (N3C) is a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy. Objectives To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity. Design, Setting, and Participants In a retrospective cohort study of 1 926 526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation). Main Outcomes and Measures Patient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression. Results The cohort included 174 568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1 133 848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174 568 adults with SARS-CoV-2, 32 472 (18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, age (odds ratio [OR], 1.03 per year; 95% CI, 1.03-1.04), male sex (OR, 1.60; 95% CI, 1.51-1.69), liver disease (OR, 1.20; 95% CI, 1.08-1.34), dementia (OR, 1.26; 95% CI, 1.13-1.41), African American (OR, 1.12; 95% CI, 1.05-1.20) and Asian (OR, 1.33; 95% CI, 1.12-1.57) race, and obesity (OR, 1.36; 95% CI, 1.27-1.46) were independently associated with higher clinical severity. Conclusions and Relevance This cohort study found that COVID-19 mortality decreased over time during 2020 and that patient demographic characteristics and comorbidities were associated with higher clinical severity. The machine learning models accurately predicted ultimate clinical severity using commonly collected clinical data from the first 24 hours of a hospital admission.
Collapse
Affiliation(s)
- Tellen D. Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Richard A. Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | | | | | - Adit Anand
- Stony Brook University, Stony Brook, New York
| | | | | | | | - James Brian Byrd
- Department of Internal Medicine, The University of Michigan at Ann Arbor, Ann Arbor
| | - Alina Denham
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York
| | - Peter E. DeWitt
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Davera Gabriel
- Institute for Clinical and Translational Research, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Brian T. Garibaldi
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | | | | | - Elaine L. Hill
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York
| | - Stephanie S. Hong
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | | | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington
| | - Kristin Kostka
- Real World Solutions, IQVIA, Cambridge, Massachusetts
- Observational Health Data Sciences and Informatics, New York, New York
| | - Harold P. Lehmann
- Division of Health Science Informatics, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Eli Levitt
- Department of Orthopaedic Surgery, University of Alabama at Birmingham, Birmingham
| | | | | | - Julie A. McMurry
- Translational and Integrative Sciences Center, Oregon State University, Corvallis
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - John Muschelli
- Department of Biostatistics, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Andrew J. Neumann
- Translational and Integrative Sciences Center, Oregon State University, Corvallis
| | | | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill
| | - Zhenglong Qian
- Department of biomedical informatics, Stony Brook University, Stony Brook, New York
| | | | - Seth Russell
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Heidi Spratt
- Department of Preventive Medicine and Public Health, University of Texas Medical Branch, Galveston
| | - Anita Walden
- Sage Bionetworks, Seattle, Washington
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland
| | - Andrew E. Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts
| | | | - Yun Jae Yoo
- Stony Brook University, Stony Brook, New York
| | - Xiaohan Tanner Zhang
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Richard L. Zhu
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Christopher P. Austin
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland
| | - Joel H. Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Ken R. Gersing
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland
| | - Melissa A. Haendel
- TriNetX, Cambridge, Massachusetts
- Center for Health AI, University of Colorado, Aurora
| | - Christopher G. Chute
- Department of Health Policy and Management, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Nursing, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
20
|
Rando HM, Bennett TD, Byrd JB, Bramante C, Callahan TJ, Chute CG, Davis HE, Deer R, Gagnier J, Koraishy FM, Liu F, McMurry JA, Moffitt RA, Pfaff ER, Reese JT, Relevo R, Robinson PN, Saltz JH, Solomonides A, Sule A, Topaloglu U, Haendel MA. Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information. medRxiv 2021:2021.03.20.21253896. [PMID: 33791733 PMCID: PMC8010765 DOI: 10.1101/2021.03.20.21253896] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely "Long COVID", but also "COVID-19 syndrome (PACS)" or, "post-acute sequelae of SARS-CoV-2 infection (PASC)". In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic itself. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.
Collapse
Affiliation(s)
- Halie M. Rando
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tellen D. Bennett
- Center for Health AI and Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA
| | | | | | - Tiffany J. Callahan
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Boulder, CO, USA
| | - Christopher G. Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Rachel Deer
- The University of Texas Medical Branch at Galveston, Galveston, TX, USA
| | - Joel Gagnier
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Boulder, CO, USA
| | | | - Feifan Liu
- University of Massachusetts Medical School Worcester, Worcester, MA, USA
| | - Julie A. McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Richard A. Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Emily R. Pfaff
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Justin T. Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Rose Relevo
- Oregon Health & Science University, Portland, OR, USA
| | - Peter N. Robinson
- The Jackson Laboratory For Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Joel H. Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | | | - Anupam Sule
- Saint Joseph Mercy Health System, Ypsilanti, MI, USA
| | - Umit Topaloglu
- School of Medicine, Wake Forest University, Winston Salem, NC, USA
| | - Melissa A. Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
21
|
Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, Kurilla MG, Michael SG, Portilla LM, Rutter JL, Austin CP, Gersing KR. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021; 28:427-443. [PMID: 32805036 PMCID: PMC7454687 DOI: 10.1093/jamia/ocaa196] [Citation(s) in RCA: 280] [Impact Index Per Article: 93.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 01/12/2023] Open
Abstract
Objective Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
Collapse
Affiliation(s)
- Melissa A Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Translational and Integrative Sciences Center, Department of Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - David A Eichmann
- School of Library and Information Science, The University of Iowa, Iowa City, Iowa, USA
| | | | | | - Philip R O Payne
- Institute for Informatics, Washington University in St. Louis, Saint Louis,Missouri, USA
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Joel H Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, Texas, USA
| | | | | | | | - Andrew E Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston,Massachusetts, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| | - Clair Blacketer
- Janssen Research and Development, LLC, Raritan, New Jersey, USA
| | - Robert L Bradford
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - James J Cimino
- University of Alabama-Birmingham, Birmingham, Alabama, USA
| | - Marshall Clark
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Evan W Colmenares
- Department of Pharmaceutical Outcomes and Policy, University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Davera Gabriel
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Alexis Graves
- University of Iowa Institute for Clinical and Translational Science, The University of Iowa, Iowa City, Iowa, USA
| | - Raju Hemadri
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Stephanie S Hong
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - George Hripscak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dazhi Jiao
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Adam M Lee
- University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Harold P Lehmann
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - Robert T Miller
- Tufts Clinical and Translational Science Institute, Tufts University, Boston,Massachusetts, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | | | | | | | - Usman Sheikh
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Harold Solbrig
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | - Anita Walden
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Sage Bionetworks, Seattle, Washington, USA
| | - Kellie M Walters
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston,Massachusetts, USA
| | | | - Richard L Zhu
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Amin Manna
- Palantir Technologies, Palo Alto, California, USA
| | | | - Michael G Kurilla
- Division of Clinical Innovation, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Sam G Michael
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Lili M Portilla
- Office of Strategic Alliances, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Joni L Rutter
- Office of the Director, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Christopher P Austin
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Ken R Gersing
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | | |
Collapse
|
22
|
Bennett TD, Moffitt RA, Hajagos JG, Amor B, Anand A, Bissell MM, Bradwell KR, Bremer C, Byrd JB, Denham A, DeWitt PE, Gabriel D, Garibaldi BT, Girvin AT, Guinney J, Hill EL, Hong SS, Jimenez H, Kavuluru R, Kostka K, Lehmann HP, Levitt E, Mallipattu SK, Manna A, McMurry JA, Morris M, Muschelli J, Neumann AJ, Palchuk MB, Pfaff ER, Qian Z, Qureshi N, Russell S, Spratt H, Walden A, Williams AE, Wooldridge JT, Yoo YJ, Zhang XT, Zhu RL, Austin CP, Saltz JH, Gersing KR, Haendel MA, Chute CG. The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction. medRxiv 2021. [PMID: 33469592 PMCID: PMC7814838 DOI: 10.1101/2021.01.12.21249511] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Background: The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy. Methods and Findings: In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients. Conclusions: This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease.
Collapse
|
23
|
Ward-Caviness CK, Weaver AM, Buranosky M, Pfaff ER, Neas LM, Devlin RB, Schwartz J, Di Q, Cascio WE, Diaz-Sanchez D. Associations Between Long-Term Fine Particulate Matter Exposure and Mortality in Heart Failure Patients. J Am Heart Assoc 2020; 9:e012517. [PMID: 32172639 PMCID: PMC7335509 DOI: 10.1161/jaha.119.012517] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Background Environmental health risks for individuals with heart failure (HF) have been inadequately studied, as these individuals are not well represented in traditional cohort studies. To address this we studied associations between long-term air pollution exposure and mortality in HF patients. Methods and Results The study population was a hospital-based cohort of individuals diagnosed with HF between July 1, 2004 and December 31, 2016 compiled using electronic health records. Individuals were followed from 1 year after initial diagnosis until death or the end of the observation period (December 31, 2016). We used Cox proportional hazards models to evaluate the association of annual average fine particulate matter (PM2.5) exposure at the time of initial HF diagnosis with all-cause mortality, adjusted for age, race, sex, distance to the nearest air pollution monitor, and socioeconomic status indicators. Among 23 302 HF patients, a 1 μg/m3 increase in annual average PM2.5 was associated with an elevated risk of all-cause mortality (hazard ratio 1.13; 95% CI, 1.10-1.15). As compared with people with exposures below the current national PM2.5 exposure standard (12 μg/m3), those with elevated exposures experienced 0.84 (95% CI, 0.73-0.95) years of life lost over a 5-year period, an observation that persisted even for those residing in areas with PM2.5 concentrations below current standards. Conclusions Residential exposure to elevated concentrations of PM2.5 is a significant mortality risk factor for HF patients. Elevated PM2.5 exposures result in substantial years of life lost even at concentrations below current national standards.
Collapse
Affiliation(s)
- Cavin K Ward-Caviness
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| | - Anne M Weaver
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| | - Matthew Buranosky
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| | - Emily R Pfaff
- NC Translational and Clinical Sciences Institute University of North Carolina-Chapel Hill Chapel Hill NC
| | - Lucas M Neas
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| | - Robert B Devlin
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| | - Joel Schwartz
- Department of Environmental Health Harvard T. H. Chan School of Public Health Boston MA.,Department of Epidemiology Harvard T. H. Chan School of Public Health Boston MA
| | - Qian Di
- Research Center for Public Health School of Medicine Tsinghua University Beijing China
| | - Wayne E Cascio
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| | - David Diaz-Sanchez
- Center for Public Health and Environmental Assessment US Environmental Protection Agency Chapel Hill NC
| |
Collapse
|
24
|
Pfaff ER, Crosskey M, Morton K, Krishnamurthy A. Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning. JMIR Med Inform 2020; 8:e16042. [PMID: 32012059 PMCID: PMC7007592 DOI: 10.2196/16042] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 10/30/2019] [Accepted: 12/16/2019] [Indexed: 01/02/2023] Open
Abstract
Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.
Collapse
Affiliation(s)
- Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | | | | | - Ashok Krishnamurthy
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
25
|
Ahalt SC, Chute CG, Fecho K, Glusman G, Hadlock J, Taylor CO, Pfaff ER, Robinson PN, Solbrig H, Ta C, Tatonetti N, Weng C. Clinical Data: Sources and Types, Regulatory Constraints, Applications. Clin Transl Sci 2019; 12:329-333. [PMID: 31074176 PMCID: PMC6617834 DOI: 10.1111/cts.12638] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 03/27/2019] [Indexed: 12/30/2022] Open
Affiliation(s)
- Stanley C Ahalt
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | | | | | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | | | - Casey Ta
- Columbia University, New York, New York, USA
| | | | | | | |
Collapse
|
26
|
Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC, Danis D, Joachimiak MP, Ravanmehr V, Pfaff ER, Champion J, Robasky K, Xu H, Fecho K, Walton NA, Zhu RL, Ramsdill J, Mungall CJ, Köhler S, Haendel MA, McDonald CJ, Vreeman DJ, Peden DB, Bennett TD, Feinstein JA, Martin B, Stefanski AL, Hunter LE, Chute CG, Robinson PN. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med 2019; 2:32. [PMID: 31119199 PMCID: PMC6527418 DOI: 10.1038/s41746-019-0110-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 04/18/2019] [Indexed: 12/22/2022] Open
Abstract
Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.
Collapse
Affiliation(s)
| | - Amy Yates
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
| | - J. P. Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Library, Oregon Health and Science University, Portland, OR 97239 USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Leigh C. Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Marcin P. Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - James Champion
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Kimberly Robasky
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Genetics Department, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Hao Xu
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Nephi A. Walton
- Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822 USA
| | - Richard L. Zhu
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Justin Ramsdill
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 10117 Germany
- Einstein Center Digital Future, Berlin, 10117 Germany
| | - Melissa A. Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
- Linus Pauling Institute and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331 USA
| | - Clement J. McDonald
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Daniel J. Vreeman
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202 USA
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, IN 46202 USA
| | - David B. Peden
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, University of North Carolina, Chapel Hill, NC 27599 USA
- University of North Carolina Center for Environmental Medicine, Asthma and Lung Biology, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Tellen D. Bennett
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - James A. Feinstein
- Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Blake Martin
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Adrianne L. Stefanski
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Christopher G. Chute
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032 USA
| |
Collapse
|
27
|
Bailey SC, Oramasionwu CU, Infanzon AC, Pfaff ER, Annis IE, Reuland DS. An Electronic Health Record-Based Strategy to Systematically Assess Medication Use Among Primary Care Patients With Multidrug Regimens: Feasibility Study. JMIR Res Protoc 2017; 6:e157. [PMID: 28798013 PMCID: PMC5571232 DOI: 10.2196/resprot.7986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 06/08/2017] [Accepted: 06/09/2017] [Indexed: 11/13/2022] Open
Abstract
Background Medication nonadherence and misuse are public health and patient safety concerns. With the increased adoption of electronic health records (EHRs), greater opportunities exist to communicate directly with, and collect data from, patients through secure portals linked to EHRs. Objective The study objectives were to develop and pilot test a method of monitoring patient medication use in outpatient settings and determine the feasibility and acceptability of this approach. Methods Adult primary care patients on multidrug regimens were recruited from an academic internal medicine clinic by a trained research assistant. After completing a baseline, in-person interview, patients were sent a link to a questionnaire about medication use via the patient portal. One week later, the RA contacted patients to complete a follow-up telephone interview assessing patient satisfaction and experience with the questionnaire. Patient EHRs were also reviewed to determine the questionnaire completion rate. Results Of 100 patients enrolled, 89 completed the follow-up interview and 82 completed the portal questionnaire. The mean age of the sample was 61.8 (range 31-88) years. Approximately half (54/100, 54%) of the sample was male, two-thirds were white (67/100, 67%) and 26% (26/100) African-American. A total of 44% reported an annual household income of <$50,000 per year, and 17% (17/100) reported a high school or less level of education. No significant differences were found in questionnaire completion rates by sociodemographic characteristics or prior portal use. Most (68/73, 93%) found the questionnaire easy to access, easy to complete (72/73, 99%), and valuable (73/89, 82%). Time constraints and log-in difficulties were the main reasons for noncompletion. Conclusions The portal questionnaire was well received by a socioeconomically diverse group of patients with high completion rates achieved. Routine use of a portal-based questionnaire could provide a valuable signal to providers and care teams about patient medication use and identify patients needing additional support.
Collapse
Affiliation(s)
- Stacy Cooper Bailey
- Division of Pharmaceutical Outcomes and Policy, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Christine U Oramasionwu
- Division of Pharmaceutical Outcomes and Policy, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Alexandra C Infanzon
- Division of Pharmaceutical Outcomes and Policy, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Izabela E Annis
- Division of Pharmaceutical Outcomes and Policy, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Daniel S Reuland
- Division of General Medicine and Clinical Epidemiology, Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
28
|
Zhong VW, Obeid JS, Craig JB, Pfaff ER, Thomas J, Jaacks LM, Beavers DP, Carey TS, Lawrence JM, Dabelea D, Hamman RF, Bowlby DA, Pihoker C, Saydah SH, Mayer-Davis EJ. An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes in Youth Study. J Am Med Inform Assoc 2016; 23:1060-1067. [PMID: 27107449 DOI: 10.1093/jamia/ocv207] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 12/02/2015] [Accepted: 12/08/2015] [Indexed: 12/16/2022] Open
Abstract
OBJECTIVE To develop an efficient surveillance approach for childhood diabetes by type across 2 large US health care systems, using phenotyping algorithms derived from electronic health record (EHR) data. MATERIALS AND METHODS Presumptive diabetes cases <20 years of age from 2 large independent health care systems were identified as those having ≥1 of the 5 indicators in the past 3.5 years, including elevated HbA1c, elevated blood glucose, diabetes-related billing codes, patient problem list, and outpatient anti-diabetic medications. EHRs of all the presumptive cases were manually reviewed, and true diabetes status and diabetes type were determined. Algorithms for identifying diabetes cases overall and classifying diabetes type were either prespecified or derived from classification and regression tree analysis. Surveillance approach was developed based on the best algorithms identified. RESULTS We developed a stepwise surveillance approach using billing code-based prespecified algorithms and targeted manual EHR review, which efficiently and accurately ascertained and classified diabetes cases by type, in both health care systems. The sensitivity and positive predictive values in both systems were approximately ≥90% for ascertaining diabetes cases overall and classifying cases with type 1 or type 2 diabetes. About 80% of the cases with "other" type were also correctly classified. This stepwise surveillance approach resulted in a >70% reduction in the number of cases requiring manual validation compared to traditional surveillance methods. CONCLUSION EHR data may be used to establish an efficient approach for large-scale surveillance for childhood diabetes by type, although some manual effort is still needed.
Collapse
Affiliation(s)
- Victor W Zhong
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Jihad S Obeid
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Jean B Craig
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Emily R Pfaff
- North Carolina TraCS Institute, University of North Carolina, Chapel Hill, NC, USA
| | - Joan Thomas
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Lindsay M Jaacks
- Hubert Department of Global Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Daniel P Beavers
- Department of Biostatistical Sciences, School of Medicine, Wake Forest University, Winston-Salem, NC, USA
| | - Timothy S Carey
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, NC, USA
| | - Jean M Lawrence
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Dana Dabelea
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Aurora, CO, USA
| | - Richard F Hamman
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Aurora, CO, USA
| | - Deborah A Bowlby
- Division of Pediatric Endocrinology, Medical University of South Carolina, Charleston, SC, USA
| | - Catherine Pihoker
- Department of Washington, University of Washington, Seattle, WA, USA
| | - Sharon H Saydah
- Centers for Disease Control and Prevention, Division of Diabetes Translation, Atlanta, GA, USA
| | - Elizabeth J Mayer-Davis
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
- Department of Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
29
|
Desai PC, Deal AM, Pfaff ER, Qaqish B, Hebden LM, Park YA, Ataga KI. Alloimmunization is associated with older age of transfused red blood cells in sickle cell disease. Am J Hematol 2015; 90:691-5. [PMID: 25963831 DOI: 10.1002/ajh.24051] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Revised: 05/06/2015] [Accepted: 05/06/2015] [Indexed: 01/31/2023]
Abstract
Red blood cell (RBC) alloimmunization is a significant clinical complication of sickle cell disease (SCD). It can lead to difficulty with cross-matching for future transfusions and may sometimes trigger life-threatening delayed hemolytic transfusion reactions. We conducted a retrospective study to explore the association of clinical complications and age of RBC with alloimmunization in patients with SCD followed at a single institution from 2005 to 2012. One hundred and sixty six patients with a total of 488 RBC transfusions were evaluated. Nineteen patients (11%) developed new alloantibodies following blood transfusions during the period of review. The median age of RBC units was 20 days (interquartile range: 14-27 days). RBC antibody formation was significantly associated with the age of RBC units (P = 0.002), with a hazard ratio of 3.5 (95% CI: 1.71-7.11) for a RBC unit that was 7 days old and 9.8 (95% CI: 2.66-35.97) for a unit that was 35 days old, 28 days after the blood transfusion. No association was observed between RBC alloimmunization and acute vaso-occlusive complications. Although increased echocardiography-derived tricuspid regurgitant jet velocity (TRV) was associated with the presence of RBC alloantibodies (P = 0.02), TRV was not significantly associated with alloimmunization when adjusted for patient age and number of transfused RBC units. Our study suggests that RBC antibody formation is significantly associated with older age of RBCs at the time of transfusion. Prospective studies in patients with SCD are required to confirm this finding.
Collapse
Affiliation(s)
- Payal C. Desai
- Division of Hematology; The Ohio State University; Columbus Ohio
| | - Allison M. Deal
- Lineberger Comprehensive Cancer Center Biostatistics Core Facility; University of North Carolina at Chapel Hill
| | - Emily R. Pfaff
- NC TraCS Institute, University of North Carolina at Chapel Hill
| | - Bahjat Qaqish
- Lineberger Comprehensive Cancer Center Biostatistics Core Facility; University of North Carolina at Chapel Hill
| | - Leyna M. Hebden
- Department of Hospital Labs; Transfusion Medicine Services, University of North Carolina Healthcare
| | - Yara A. Park
- Department of Pathology and Laboratory Medicine; University of North Carolina at Chapel Hill
| | - Kenneth I. Ataga
- Division of Hematology/Oncology; University of North Carolina at Chapel Hill
| |
Collapse
|
30
|
Zhong VW, Pfaff ER, Beavers DP, Thomas J, Jaacks LM, Bowlby DA, Carey TS, Lawrence JM, Dabelea D, Hamman RF, Pihoker C, Saydah SH, Mayer-Davis EJ. Use of administrative and electronic health record data for development of automated algorithms for childhood diabetes case ascertainment and type classification: the SEARCH for Diabetes in Youth Study. Pediatr Diabetes 2014; 15:573-84. [PMID: 24913103 PMCID: PMC4229415 DOI: 10.1111/pedi.12152] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Revised: 03/31/2014] [Accepted: 04/18/2014] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND The performance of automated algorithms for childhood diabetes case ascertainment and type classification may differ by demographic characteristics. OBJECTIVE This study evaluated the potential of administrative and electronic health record (EHR) data from a large academic care delivery system to conduct diabetes case ascertainment in youth according to type, age, and race/ethnicity. SUBJECTS Of 57 767 children aged <20 yr as of 31 December 2011 seen at University of North Carolina Health Care System in 2011 were included. METHODS Using an initial algorithm including billing data, patient problem lists, laboratory test results, and diabetes related medications between 1 July 2008 and 31 December 2011, presumptive cases were identified and validated by chart review. More refined algorithms were evaluated by type (type 1 vs. type 2), age (<10 vs. ≥10 yr) and race/ethnicity (non-Hispanic White vs. 'other'). Sensitivity, specificity, and positive predictive value were calculated and compared. RESULTS The best algorithm for ascertainment of overall diabetes cases was billing data. The best type 1 algorithm was the ratio of the number of type 1 billing codes to the sum of type 1 and type 2 billing codes ≥0.5. A useful algorithm to ascertain youth with type 2 diabetes with 'other' race/ethnicity was identified. Considerable age and racial/ethnic differences were present in type-non-specific and type 2 algorithms. CONCLUSIONS Administrative and EHR data may be used to identify cases of childhood diabetes (any type), and to identify type 1 cases. The performance of type 2 case ascertainment algorithms differed substantially by race/ethnicity.
Collapse
Affiliation(s)
- Victor W. Zhong
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Emily R. Pfaff
- North Carolina TraCS Institute, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Daniel P. Beavers
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Joan Thomas
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Lindsay M. Jaacks
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Deborah A. Bowlby
- Division of Pediatric Endocrinology, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Timothy S. Carey
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, NC, USA
| | - Jean M. Lawrence
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Dana Dabelea
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Denver, Aurora, Colorado, USA
| | - Richard F. Hamman
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Denver, Aurora, Colorado, USA
| | - Catherine Pihoker
- Department of Washington, University of Washington, Seattle, Washington, USA
| | - Sharon H. Saydah
- Centers for Disease Control and Prevention, Division of Diabetes Translation, Atlanta, Georgia, USA
| | - Elizabeth J. Mayer-Davis
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA,Department of Medicine, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | | |
Collapse
|
31
|
Rybnicek DA, Hathorn KE, Pfaff ER, Bulsiewicz WJ, Shaheen NJ, Dellon ES. Administrative coding is specific, but not sensitive, for identifying eosinophilic esophagitis. Dis Esophagus 2014; 27:703-8. [PMID: 24215617 PMCID: PMC4018425 DOI: 10.1111/dote.12141] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The use of administrative databases to conduct population-based studies of eosinophilic esophagitis (EoE) in the United States is limited because it is unknown whether the International Classification of Diseases, Ninth Revision (ICD-9) code for EoE, 530.13, accurately identifies those who truly have the disease. The aim of this retrospective study was to validate the ICD-9 code for identifying cases of EoE in administrative data. Confirmed cases of EoE as per consensus guidelines (symptoms of esophageal dysfunction and ≥15 eosinophils per high-power field on biopsy after 8 weeks of twice daily proton pump inhibitor therapy) were identified in the University of North Carolina (UNC) EoE Clinicopathologic Database from 2008 to 2010; 2008 was the first year in which the 530.13 code was approved. Using the Carolina Data Warehouse, the administrative database for patients seen in the UNC system, all diagnostic and procedure codes were obtained for these cases. Then, with the EoE cases as the reference standard, we re-queried the Carolina Data Warehouse over the same time frame for all patients seen in the system (n=308,372) and calculated the sensitivity and specificity of the ICD-9 code 530.13 as a case definition of EoE. To attempt to refine the case definition, we added procedural codes in an iterative fashion to optimize sensitivity and specificity, and restricted our analysis to privately insured patients. We also conducted a sensitivity analysis with 2011 data to identify trends in the operating parameters of the code. We identified 226 cases of EoE at UNC to serve as the reference standard. The ICD-9 code 530.13 yielded a sensitivity of 37% (83/226; 95% confidence interval: 31-43%) and specificity of 99% (308,111/308,146; 95% confidence interval: 98-100%). These operating parameters were not substantially altered if the case definition required a procedure code for endoscopy or if cases were limited to those with commercial insurance. However, in 2011, the sensitivity of the code had increased to 61%, while the specificity remained at 99%. The ICD-9 code for EoE, 530.13, had excellent specificity for identifying cases of EoE in administrative data, although this high specificity was achieved at an academic center. Additionally, the sensitivity of the code appears to be increasing over time, and the threshold at which it will stabilize is not known. While use of this administrative code will still miss a number of cases, those identified in this manner are highly likely to have the disease.
Collapse
Affiliation(s)
- David A. Rybnicek
- Center for Esophageal Diseases and Swallowing, University of North Carolina School of Medicine, Chapel Hill, NC,Center for Gastrointestinal Biology and Disease, Division of Gastroenterology and Hepatology, Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, NC
| | - Kelly E. Hathorn
- Center for Esophageal Diseases and Swallowing, University of North Carolina School of Medicine, Chapel Hill, NC,Center for Gastrointestinal Biology and Disease, Division of Gastroenterology and Hepatology, Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, NC
| | - Emily R. Pfaff
- Carolina Data Warehouse, North Carolina Translational and Clinical Sciences Institute, University of North Carolina, Chapel Hill
| | - William J. Bulsiewicz
- Center for Esophageal Diseases and Swallowing, University of North Carolina School of Medicine, Chapel Hill, NC,Center for Gastrointestinal Biology and Disease, Division of Gastroenterology and Hepatology, Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, NC
| | - Nicholas J. Shaheen
- Center for Esophageal Diseases and Swallowing, University of North Carolina School of Medicine, Chapel Hill, NC,Center for Gastrointestinal Biology and Disease, Division of Gastroenterology and Hepatology, Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, NC
| | - Evan S. Dellon
- Center for Esophageal Diseases and Swallowing, University of North Carolina School of Medicine, Chapel Hill, NC,Center for Gastrointestinal Biology and Disease, Division of Gastroenterology and Hepatology, Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, NC
| |
Collapse
|