1
|
Wan NC, Grabowska ME, Kerchberger VE, Wei WQ. Exploring beyond diagnoses in electronic health records to improve discovery: a review of the phenome-wide association study. JAMIA Open 2025; 8:ooaf006. [PMID: 40041255 PMCID: PMC11879097 DOI: 10.1093/jamiaopen/ooaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/30/2024] [Accepted: 01/24/2025] [Indexed: 03/06/2025] Open
Abstract
Objective The phenome-wide association study (PheWAS) systematically examines the phenotypic spectrum extracted from electronic health records (EHRs) to uncover correlations between phenotypes and exposures. This review explores methodologies, highlights challenges, and outlines future directions for EHR-driven PheWAS. Materials and Methods We searched the PubMed database for articles spanning from 2010 to 2023, and we collected data regarding exposures, phenotypes, cohorts, terminologies, replication, and ancestry. Results Our search yielded 690 articles. Following exclusion criteria, we identified 291 articles published between January 1, 2010, and December 31, 2023. A total number of 162 (55.6%) articles defined phenomes using phecodes, indicating that research is reliant on the organization of billing codes. Moreover, 72.8% of articles utilized exposures consisting of genetic data, and the majority (69.4%) of PheWAS lacked replication analyses. Discussion Existing literature underscores the need for deeper phenotyping, variability in PheWAS exposure variables, and absence of replication in PheWAS. Current applications of PheWAS mainly focus on cardiovascular, metabolic, and endocrine phenotypes; thus, applications of PheWAS in uncommon diseases, which may lack structured data, remain largely understudied. Conclusions With modern EHRs, future PheWAS should extend beyond diagnosis codes and consider additional data like clinical notes or medications to create comprehensive phenotype profiles that consider severity, temporality, risk, and ancestry. Furthermore, data interoperability initiatives may help mitigate the paucity of PheWAS replication analyses. With the growing availability of data in EHR, PheWAS will remain a powerful tool in precision medicine.
Collapse
Affiliation(s)
- Nicholas C Wan
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240, United States
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37302, United States
| | - Vern Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37302, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37302, United States
| |
Collapse
|
2
|
Chow NKN, Tsang CYW, Chan YH, Telaga SA, Ng LYA, Chung CM, Yip YM, Cheung PPH. The effect of pre-COVID and post-COVID vaccination on long COVID: A systematic review and meta-analysis. J Infect 2024; 89:106358. [PMID: 39580033 DOI: 10.1016/j.jinf.2024.106358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 11/10/2024] [Accepted: 11/15/2024] [Indexed: 11/25/2024]
Abstract
BACKGROUND Long COVID affects millions of people and results in a substantial decrease in quality of life. Previous primary studies and reviews attempted to study the effect of vaccination against long COVID, but these studies varied in the cut-off time of long COVID. We adhered to the WHO's definition of long COVID and conducted a systematic review and meta-analysis on the effect of pre-COVID and post-COVID vaccination on long COVID. METHODS We obtained data from 13 databases up to 18 February 2024, including peer reviewed and preprint studies. Our inclusion criteria were: (1) long COVID definition as 3 months or beyond, (2) comparing long COVID symptoms between vaccinated and unvaccinated groups, (3) subjects received vaccinations either before or after infected with COVID, (4) the number of doses received by participants was specified. We extracted study characteristics and data and computed the summary odds ratio (OR) with the DerSimonian and Laird random effects model. We then performed subgroup analyses based on the main vaccine brand and long COVID assessment method. ROBINS-I framework was used for assessment of risk of bias and the GRADE approach was used for evaluating the certainty of evidence. FINDINGS We included data from 25 observational studies (n = 14,128,260) with no randomised controlled trials. One-dose pre-COVID vaccination did not have an effect on long COVID (number of studies = 10, summary OR = 1.01, 95% CI = 0.88-1.15, p-value = 0.896). Two-dose pre-COVID vaccination was associated with a 24% reduced odds of long COVID (number of studies = 15, summary OR = 0.76, 95% CI = 0.65-0.89, p-value = 0.001) and 4 symptoms (fatigue, headache, loss of smell, muscle pain) out of 10 symptoms analysed. The OR of three-dose pre-COVID vaccination against overall long COVID was statistically insignificant but was far away from 1 (number of studies = 3, summary OR = 0.31, 95% CI = 0.05-1.84, p-value = 0.198). One-dose post-COVID vaccination was associated with a 15% reduced odds of long COVID (number of studies = 5, summary OR = 0.85, 95% CI = 0.73-0.98, p-value = 0.024). The OR of two-dose post-COVID vaccination against long COVID was statistically insignificant but was far away from 1 (number of studies = 3, summary OR = 0.63, 95% CI = 0.38-1.03, p-value = 0.066). INTERPRETATION Our study suggests that 2-dose pre-COVID vaccination and 1-dose post-COVID vaccination are associated with a lower risk of long COVID. Since long COVID reduces quality of life substantially, vaccination could be a possible measure to maintain quality of life by partially protecting against long COVID.
Collapse
Affiliation(s)
- Nick King Ngai Chow
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Charmaine Yuk Wah Tsang
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Yan Hei Chan
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Shalina Alisha Telaga
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Lok Yan Andes Ng
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Chit Ming Chung
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Yan Ming Yip
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong
| | - Peter Pak-Hang Cheung
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong.
| |
Collapse
|
3
|
Allaire P, Elsayed NS, Berg RL, Rose W, Shukla SK. Phenome-wide association study identifies new clinical phenotypes associated with Staphylococcus aureus infections. PLoS One 2024; 19:e0303395. [PMID: 38968223 PMCID: PMC11226111 DOI: 10.1371/journal.pone.0303395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 04/23/2024] [Indexed: 07/07/2024] Open
Abstract
BACKGROUND Phenome-Wide Association study (PheWAS) is a powerful tool designed to systematically screen clinical observations derived from medical records (phenotypes) for association with a variable of interest. Despite their usefulness, no systematic screening of phenotypes associated with Staphylococcus aureus infections (SAIs) has been done leaving potential novel risk factors or complications undiscovered. METHOD AND COHORTS We tailored the PheWAS approach into a two-stage screening procedure to identify novel phenotypes correlating with SAIs. The first stage screened for co-occurrence of SAIs with other phenotypes within medical records. In the second stage, significant findings were examined for the correlations between their age of onset with that of SAIs. The PheWAS was implemented using the medical records of 754,401 patients from the Marshfield Clinic Health System. Any novel associations discovered were subsequently validated using datasets from TriNetX and All of Us, encompassing 109,884,571 and 118,538 patients respectively. RESULTS Forty-one phenotypes met the significance criteria of a p-value < 3.64e-5 and odds ratios of > 5. Out of these, we classified 23 associations either as risk factors or as complications of SAIs. Three novel associations were discovered and classified either as a risk (long-term use of aspirin) or complications (iron deficiency anemia and anemia of chronic disease). All novel associations were replicated in the TriNetX cohort. In the All of Us cohort, anemia of chronic disease was replicated according to our significance criteria. CONCLUSIONS The PheWAS of SAIs expands our understanding of SAIs interacting phenotypes. Additionally, the novel two-stage PheWAS approach developed in this study can be applied to examine other disease-disease interactions of interest. Due to the possibility of bias inherent in observational data, the findings of this study require further investigation.
Collapse
Affiliation(s)
- Patrick Allaire
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America
| | - Noha S. Elsayed
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America
| | - Richard L. Berg
- Research Computing and Analytics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America
| | - Warren Rose
- School of Pharmacy, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Sanjay K. Shukla
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America
- Computational and Informatics in Biology and Medicine Program, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
4
|
Huang SY, Johnathan R, Shah N, Srivastava P, Huang AA, Gress F. Technical Report: Protocol for Characterizing Phenotype Variants Using Phenome-Wide Association Study (PheWAS) Utilizing the Nationwide Inpatient Sample 2020 in Individuals With Pancreatic Cysts and Lung Cancer. Cureus 2023; 15:e50982. [PMID: 38259398 PMCID: PMC10801675 DOI: 10.7759/cureus.50982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
This technical report serves as a comprehensive guide for conducting a phenome-wide association study (PheWAS) utilizing data extracted from the Nationwide Inpatient Sample 2020. Specifically tailored to individuals diagnosed with pancreatic cysts and lung cancer, the report establishes a step-by-step workflow designed to assist researchers in uncovering potential associations within this specific cohort. The methodology outlined in the report ensures clarity and reproducibility by employing a curated cohort sourced from the GitHub repository and executed using R for robust data analysis. The code encompasses pivotal steps, including the utilization of a QQ plot as a crucial diagnostic tool aimed at identifying systematic biases or associations. Additionally, the report incorporates the creation of a Manhattan plot, delving into essential mathematical considerations to enhance the interpretability of the results. Notably, the report elucidates the handling of the International Classification of Disease version 10 (ICD-10) codes, providing a sample approach for their segmentation to analyze associations by diagnostic categories. The segmentation aligns with the guidelines outlined in the American Medical Association's ICD-10-CM 2022, the Complete Official Codebook with Guidelines (American Medical Association Press, 2021), ensuring a standardized and rigorous analytical process. This comprehensive guide equips researchers with the tools and insights needed to navigate the complexities of PheWAS within the context of pancreatic cysts and lung cancer, fostering transparency, reproducibility, and meaningful scientific exploration.
Collapse
Affiliation(s)
- Samuel Y Huang
- Internal Medicine, Icahn School of Medicine at Mount Sinai South Nassau, Oceanside, USA
| | - Reyes Johnathan
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Neal Shah
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Pranay Srivastava
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alexander A Huang
- General Surgery, Northwestern University Feinberg School of Medicine, Chicago, USA
| | - Frank Gress
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| |
Collapse
|
5
|
Fritsche LG, Nam K, Du J, Kundu R, Salvatore M, Shi X, Lee S, Burgess S, Mukherjee B. Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks. PLoS Genet 2023; 19:e1010907. [PMID: 38113267 PMCID: PMC10763941 DOI: 10.1371/journal.pgen.1010907] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/03/2024] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open
Abstract
OBJECTIVE To overcome the limitations associated with the collection and curation of COVID-19 outcome data in biobanks, this study proposes the use of polygenic risk scores (PRS) as reliable proxies of COVID-19 severity across three large biobanks: the Michigan Genomics Initiative (MGI), UK Biobank (UKB), and NIH All of Us. The goal is to identify associations between pre-existing conditions and COVID-19 severity. METHODS Drawing on a sample of more than 500,000 individuals from the three biobanks, we conducted a phenome-wide association study (PheWAS) to identify associations between a PRS for COVID-19 severity, derived from a genome-wide association study on COVID-19 hospitalization, and clinical pre-existing, pre-pandemic phenotypes. We performed cohort-specific PRS PheWAS and a subsequent fixed-effects meta-analysis. RESULTS The current study uncovered 23 pre-existing conditions significantly associated with the COVID-19 severity PRS in cohort-specific analyses, of which 21 were observed in the UKB cohort and two in the MGI cohort. The meta-analysis yielded 27 significant phenotypes predominantly related to obesity, metabolic disorders, and cardiovascular conditions. After adjusting for body mass index, several clinical phenotypes, such as hypercholesterolemia and gastrointestinal disorders, remained associated with an increased risk of hospitalization following COVID-19 infection. CONCLUSION By employing PRS as a proxy for COVID-19 severity, we corroborated known risk factors and identified novel associations between pre-existing clinical phenotypes and COVID-19 severity. Our study highlights the potential value of using PRS when actual outcome data may be limited or inadequate for robust analyses.
Collapse
Affiliation(s)
- Lars G. Fritsche
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Kisung Nam
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| | - Jiacong Du
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Ritoban Kundu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Maxwell Salvatore
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Xu Shi
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| | - Stephen Burgess
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
- Cardiovascular Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|