1
|
Jones SE, Bradwell KR, Chan LE, McMurry JA, Olson-Chen C, Tarleton J, Wilkins KJ, Ly V, Ljazouli S, Qin Q, Faherty EG, Lau YK, Xie C, Kao YH, Liebman MN, Mariona F, Challa AP, Li L, Ratcliffe SJ, Haendel MA, Patel RC, Hill EL. Who is pregnant? Defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C). JAMIA Open 2023; 6:ooad067. [PMID: 37600074 PMCID: PMC10432357 DOI: 10.1093/jamiaopen/ooad067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/12/2023] [Accepted: 08/08/2023] [Indexed: 08/22/2023] Open
Abstract
Objectives To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C). Materials and Methods We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and applied it to EHR data in the N3C (January 1, 2018-April 7, 2022). HIPPS combines: (1) an extension of a previously published pregnancy episode algorithm, (2) a novel algorithm to detect gestational age-specific signatures of a progressing pregnancy for further episode support, and (3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated pregnancy cohorts based on gestational age precision and pregnancy outcomes for assessment of accuracy and comparison of COVID-19 and other characteristics. Results We identified 628 165 pregnant persons with 816 471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, abortions), and 23.3% had unknown outcomes. Clinician validation agreed 98.8% with HIPPS-identified episodes. We were able to estimate start dates within 1 week of precision for 475 433 (58.2%) episodes. 62 540 (7.7%) episodes had incident COVID-19 during pregnancy. Discussion HIPPS provides measures of support for pregnancy-related variables such as gestational age and pregnancy outcomes based on N3C data. Gestational age precision allows researchers to find time to events with reasonable confidence. Conclusion We have developed a novel and robust approach for inferring pregnancy episodes and gestational age that addresses data inconsistency and missingness in EHR data.
Collapse
Affiliation(s)
- Sara E Jones
- Office of Data Science and Emerging Technologies, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852, United States
| | | | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Julie A McMurry
- Department of Biomedical Informatics, University of Colorado, Anschutz Medical Campus, Aurora, CO 80045, United States
| | - Courtney Olson-Chen
- Department of Obstetrics and Gynecology, University of Rochester Medical Center, Rochester, NY 14620, United States
| | - Jessica Tarleton
- Department of Obstetrics and Gynecology, Medical University of South Carolina, Charleston, SC 29425, United States
| | - Kenneth J Wilkins
- Biostatistics Program, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, United States
| | - Victoria Ly
- Department of Obstetrics and Gynecology, University of Rochester Medical Center, Rochester, NY 14620, United States
| | - Saad Ljazouli
- Palantir Technologies, Denver, CO 80202, United States
| | - Qiuyuan Qin
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY 14618, United States
| | - Emily Groene Faherty
- School of Public Health, University of Minnesota, Minneapolis, MN 55455, United States
| | | | - Catherine Xie
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY 14618, United States
| | - Yu-Han Kao
- Sema4, Stamford, CT 06902, United States
| | | | - Federico Mariona
- Beaumont Hospital, Dearborn, MI 48124, United States
- Wayne State University, Detroit, MI 48202, United States
| | - Anup P Challa
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN 37212, United States
| | - Li Li
- Sema4, Stamford, CT 06902, United States
| | - Sarah J Ratcliffe
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22903, United States
| | - Melissa A Haendel
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Rena C Patel
- Department of Medicine and Global Health, University of Washington, Seattle, WA 98105, United States
| | - Elaine L Hill
- Department of Obstetrics and Gynecology, University of Rochester Medical Center, Rochester, NY 14620, United States
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY 14618, United States
| |
Collapse
|
2
|
Jones S, Bradwell KR, Chan LE, Olson-Chen C, Tarleton J, Wilkins KJ, Qin Q, Faherty EG, Lau YK, Xie C, Kao YH, Liebman MN, Mariona F, Challa A, Li L, Ratcliffe SJ, McMurry JA, Haendel MA, Patel RC, Hill EL. Who is pregnant? defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C). medRxiv 2022:2022.08.04.22278439. [PMID: 35982668 PMCID: PMC9387155 DOI: 10.1101/2022.08.04.22278439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Objective To define pregnancy episodes and estimate gestational aging within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C). Materials and Methods We developed a comprehensive approach, named H ierarchy and rule-based pregnancy episode I nference integrated with P regnancy P rogression S ignatures (HIPPS) and applied it to EHR data in the N3C from 1 January 2018 to 7 April 2022. HIPPS combines: 1) an extension of a previously published pregnancy episode algorithm, 2) a novel algorithm to detect gestational aging-specific signatures of a progressing pregnancy for further episode support, and 3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated three types of pregnancy cohorts based on the level of precision for gestational aging and pregnancy outcomes for comparison of COVID-19 and other characteristics. Results We identified 628,165 pregnant persons with 816,471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, spontaneous abortions), and 23.3% had unknown outcomes. We were able to estimate start dates within one week of precision for 431,173 (52.8%) episodes. 66,019 (8.1%) episodes had incident COVID-19 during pregnancy. Across varying COVID-19 cohorts, patient characteristics were generally similar though pregnancy outcomes differed. Discussion HIPPS provides support for pregnancy-related variables based on EHR data for researchers to define pregnancy cohorts. Our approach performed well based on clinician validation. Conclusion We have developed a novel and robust approach for inferring pregnancy episodes and gestational aging that addresses data inconsistency and missingness in EHR data.
Collapse
Affiliation(s)
- Sara Jones
- Office of Data Science and Emerging Technologies, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD
| | | | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR
| | - Courtney Olson-Chen
- Department of Obstetrics and Gynecology, University of Rochester Medical Center, Rochester, NY
| | - Jessica Tarleton
- Department of Obstetrics and Gynecology, Medical University of South Carolina, Charleston, SC
| | - Kenneth J Wilkins
- Biostatistics Program, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD
| | - Qiuyuan Qin
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY
| | | | | | - Catherine Xie
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY
| | | | | | - Federico Mariona
- Beaumont Hospital, Dearborn, MI
- Wayne State University, Detroit, MI
| | - Anup Challa
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN
| | | | - Sarah J Ratcliffe
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA
| | - Julie A McMurry
- Department of Biomedical Informatics, University of Colorado, Anschutz Medical Campus, Aurora, CO
| | - Melissa A Haendel
- Department of Biomedical Informatics, University of Colorado, Anschutz Medical Campus, Aurora, CO
| | - Rena C Patel
- Department of Medicine and Global Health, University of Washington, Seattle, WA
| | - Elaine L Hill
- Department of Obstetrics and Gynecology, University of Rochester Medical Center, Rochester, NY
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY
| |
Collapse
|