1
|
Wan NC, Grabowska ME, Kerchberger VE, Wei WQ. Exploring beyond diagnoses in electronic health records to improve discovery: a review of the phenome-wide association study. JAMIA Open 2025; 8:ooaf006. [PMID: 40041255 PMCID: PMC11879097 DOI: 10.1093/jamiaopen/ooaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/30/2024] [Accepted: 01/24/2025] [Indexed: 03/06/2025] Open
Abstract
Objective The phenome-wide association study (PheWAS) systematically examines the phenotypic spectrum extracted from electronic health records (EHRs) to uncover correlations between phenotypes and exposures. This review explores methodologies, highlights challenges, and outlines future directions for EHR-driven PheWAS. Materials and Methods We searched the PubMed database for articles spanning from 2010 to 2023, and we collected data regarding exposures, phenotypes, cohorts, terminologies, replication, and ancestry. Results Our search yielded 690 articles. Following exclusion criteria, we identified 291 articles published between January 1, 2010, and December 31, 2023. A total number of 162 (55.6%) articles defined phenomes using phecodes, indicating that research is reliant on the organization of billing codes. Moreover, 72.8% of articles utilized exposures consisting of genetic data, and the majority (69.4%) of PheWAS lacked replication analyses. Discussion Existing literature underscores the need for deeper phenotyping, variability in PheWAS exposure variables, and absence of replication in PheWAS. Current applications of PheWAS mainly focus on cardiovascular, metabolic, and endocrine phenotypes; thus, applications of PheWAS in uncommon diseases, which may lack structured data, remain largely understudied. Conclusions With modern EHRs, future PheWAS should extend beyond diagnosis codes and consider additional data like clinical notes or medications to create comprehensive phenotype profiles that consider severity, temporality, risk, and ancestry. Furthermore, data interoperability initiatives may help mitigate the paucity of PheWAS replication analyses. With the growing availability of data in EHR, PheWAS will remain a powerful tool in precision medicine.
Collapse
Affiliation(s)
- Nicholas C Wan
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240, United States
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37302, United States
| | - Vern Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37302, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37302, United States
| |
Collapse
|
2
|
Guralnik E. US public health surveillance, reimagined. Learn Health Syst 2024; 8:e10445. [PMID: 39444500 PMCID: PMC11493541 DOI: 10.1002/lrh2.10445] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 06/24/2024] [Accepted: 07/25/2024] [Indexed: 10/25/2024] Open
Abstract
Introduction This study presents two novel concepts for standardizing electronic health records (EHR)-based public health surveillance through utilization of existing informatics methods and data platforms. Methods Drawing from the collective experience in applied epidemiology, health services research and health informatics, the author presents a vision for an alternative path to public health surveillance by repurposing existing tools and resources, such as (1) computable phenotypes which have already been created and validated for a variety of chronic diseases of interest to public health and (2) large data platforms/collaboratives, such as All of Us Research Program and National COVID Cohort Collaborative. Opportunities and challenges are discussed regarding EHR-based chronic disease surveillance, as well as the concept of phenotype definitions and large data platforms reuse for public health needs. Results/Framework Reusing of computable phenotypes for EHR-based public health surveillance would require secure data platforms and nationally representative data. Standardization metrics for reuse of previously developed and validated computable phenotypes are also necessary and are currently being developed by the author. This study presents a reimagined Learning Health System framework by incorporating Public Health and two novel concept sets of solutions into the healthcare ecosystem. Conclusion/Next Steps Alternative approaches to limited resources and current infrastructure of the US Public Health System, especially as applied to disease surveillance, are needed and may be possible when repurposing the resources and methodologies across the Learning Health System.
Collapse
Affiliation(s)
- Elina Guralnik
- Department of Health Administration and PolicyCollege of Public Health, George Mason UniversityFairfaxVAUSA
| |
Collapse
|
3
|
Grabowska ME, Van Driest SL, Robinson JR, Patrick AE, Guardo C, Gangireddy S, Ong HH, Feng Q, Carroll R, Kannankeril PJ, Wei WQ. Developing and evaluating pediatric phecodes (Peds-Phecodes) for high-throughput phenotyping using electronic health records. J Am Med Inform Assoc 2024; 31:386-395. [PMID: 38041473 PMCID: PMC10797257 DOI: 10.1093/jamia/ocad233] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/04/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023] Open
Abstract
OBJECTIVE Pediatric patients have different diseases and outcomes than adults; however, existing phecodes do not capture the distinctive pediatric spectrum of disease. We aim to develop specialized pediatric phecodes (Peds-Phecodes) to enable efficient, large-scale phenotypic analyses of pediatric patients. MATERIALS AND METHODS We adopted a hybrid data- and knowledge-driven approach leveraging electronic health records (EHRs) and genetic data from Vanderbilt University Medical Center to modify the most recent version of phecodes to better capture pediatric phenotypes. First, we compared the prevalence of patient diagnoses in pediatric and adult populations to identify disease phenotypes differentially affecting children and adults. We then used clinical domain knowledge to remove phecodes representing phenotypes unlikely to affect pediatric patients and create new phecodes for phenotypes relevant to the pediatric population. We further compared phenome-wide association study (PheWAS) outcomes replicating known pediatric genotype-phenotype associations between Peds-Phecodes and phecodes. RESULTS The Peds-Phecodes aggregate 15 533 ICD-9-CM codes and 82 949 ICD-10-CM codes into 2051 distinct phecodes. Peds-Phecodes replicated more known pediatric genotype-phenotype associations than phecodes (248 vs 192 out of 687 SNPs, P < .001). DISCUSSION We introduce Peds-Phecodes, a high-throughput EHR phenotyping tool tailored for use in pediatric populations. We successfully validated the Peds-Phecodes using genetic replication studies. Our findings also reveal the potential use of Peds-Phecodes in detecting novel genotype-phenotype associations for pediatric conditions. We expect that Peds-Phecodes will facilitate large-scale phenomic and genomic analyses in pediatric populations. CONCLUSION Peds-Phecodes capture higher-quality pediatric phenotypes and deliver superior PheWAS outcomes compared to phecodes.
Collapse
Affiliation(s)
- Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Sara L Van Driest
- Department of Pediatrics and the Center for Pediatric Precision Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Jamie R Robinson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Anna E Patrick
- Department of Pediatrics and the Center for Pediatric Precision Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Chris Guardo
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Srushti Gangireddy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - QiPing Feng
- Department of Medicine, Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Robert Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Prince J Kannankeril
- Department of Pediatrics and the Center for Pediatric Precision Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| |
Collapse
|
4
|
Grabowska ME, Van Driest SL, Robinson JR, Patrick AE, Guardo C, Gangireddy S, Ong H, Feng Q, Carroll R, Kannankeril PJ, Wei WQ. Developing and Evaluating Pediatric Phecodes (Peds-Phecodes) for High-Throughput Phenotyping Using Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.22.23294435. [PMID: 37662278 PMCID: PMC10473796 DOI: 10.1101/2023.08.22.23294435] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Objective Pediatric patients have different diseases and outcomes than adults; however, existing phecodes do not capture the distinctive pediatric spectrum of disease. We aim to develop specialized pediatric phecodes (Peds-Phecodes) to enable efficient, large-scale phenotypic analyses of pediatric patients. Materials and Methods We adopted a hybrid data- and knowledge-driven approach leveraging electronic health records (EHRs) and genetic data from Vanderbilt University Medical Center to modify the most recent version of phecodes to better capture pediatric phenotypes. First, we compared the prevalence of patient diagnoses in pediatric and adult populations to identify disease phenotypes differentially affecting children and adults. We then used clinical domain knowledge to remove phecodes representing phenotypes unlikely to affect pediatric patients and create new phecodes for phenotypes relevant to the pediatric population. We further compared phenome-wide association study (PheWAS) outcomes replicating known pediatric genotype-phenotype associations between Peds-Phecodes and phecodes. Results The Peds-Phecodes aggregate 15,533 ICD-9-CM codes and 82,949 ICD-10-CM codes into 2,051 distinct phecodes. Peds-Phecodes replicated more known pediatric genotype-phenotype associations than phecodes (248 versus 192 out of 687 SNPs, p<0.001). Discussion We introduce Peds-Phecodes, a high-throughput EHR phenotyping tool tailored for use in pediatric populations. We successfully validated the Peds-Phecodes using genetic replication studies. Our findings also reveal the potential use of Peds-Phecodes in detecting novel genotype-phenotype associations for pediatric conditions. We expect that Peds-Phecodes will facilitate large-scale phenomic and genomic analyses in pediatric populations. Conclusion Peds-Phecodes capture higher-quality pediatric phenotypes and deliver superior PheWAS outcomes compared to phecodes.
Collapse
|
5
|
Bakken S. Advancing phenotyping through informatics innovation. J Am Med Inform Assoc 2023; 30:211-212. [PMID: 36651578 PMCID: PMC9846669 DOI: 10.1093/jamia/ocac247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 12/09/2022] [Indexed: 01/19/2023] Open
Affiliation(s)
- Suzanne Bakken
- Corresponding Author: Suzanne Bakken, PhD, RN, FAAN, FACMI, FIAHSI, Department of Biomedical Informatics, Data Science Institute, School of Nursing, Columbia University, 630 W. 168th Street, New York, NY 10032, USA;
| |
Collapse
|
6
|
Haupert SR, Shi X, Chen C, Fritsche LG, Mukherjee B. A Case-Crossover Phenome-wide association study (PheWAS) for understanding Post-COVID-19 diagnosis patterns. J Biomed Inform 2022; 136:104237. [PMID: 36283580 PMCID: PMC9595430 DOI: 10.1016/j.jbi.2022.104237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/30/2022] [Accepted: 10/19/2022] [Indexed: 11/08/2022]
Abstract
BACKGROUND Post COVID-19 condition (PCC) is known to affect a large proportion of COVID-19 survivors. Robust study design and methods are needed to understand post-COVID-19 diagnosis patterns in all survivors, not just those clinically diagnosed with PCC. METHODS We applied a case-crossover Phenome-Wide Association Study (PheWAS) in a retrospective cohort of COVID-19 survivors, comparing the occurrences of 1,671 diagnosis-based phenotype codes (PheCodes) pre- and post-COVID-19 infection periods in the same individual using a conditional logistic regression. We studied how this pattern varied by COVID-19 severity and vaccination status, and we compared to test negative and test negative but flu positive controls. RESULTS In 44,198 SARS-CoV-2-positive patients, we foundenrichment in respiratory,circulatory, and mental health disorders post-COVID-19-infection. Top hits included anxiety disorder (p = 2.8e-109, OR = 1.7 [95 % CI: 1.6-1.8]), cardiac dysrhythmias (p = 4.9e-87, OR = 1.7 [95 % CI: 1.6-1.8]), and respiratory failure, insufficiency, arrest (p = 5.2e-75, OR = 2.9 [95 % CI: 2.6-3.3]). In severe patients, we found stronger associations with respiratory and circulatory disorders compared to mild/moderate patients. Fully vaccinated patients had mental health and chronic circulatory diseases rise to the top of the association list, similar to the mild/moderate cohort. Both control groups (test negative, test negative and flu positive) showed a different pattern of hits to SARS-CoV-2 positives. CONCLUSIONS Patients experience myriad symptoms more than 28 days after SARS-CoV-2 infection, but especially respiratory, circulatory, and mental health disorders. Our case-crossover PheWAS approach controls for within-person confounders that are time-invariant. Comparison to test negatives and test negative but flu positive patients with a similar design helped identify enrichment specific to COVID-19. This design may be applied other emerging diseases with long-lasting effects other than a SARS-CoV-2 infection. Given the potential for bias from observational data, these results should be considered exploratory. As we look into the future, we must be aware of COVID-19 survivors' healthcare needs.
Collapse
Affiliation(s)
- Spencer R Haupert
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Xu Shi
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Chen Chen
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Lars G Fritsche
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA.
| |
Collapse
|