1
|
Jian X, Zhang D, Yu Z, Xu H, Bian J, Wu Y, Tong J, Chen Y. Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies. J Biomed Inform 2025; 166:104839. [PMID: 40316004 DOI: 10.1016/j.jbi.2025.104839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 03/25/2025] [Accepted: 04/23/2025] [Indexed: 05/04/2025]
Abstract
OBJECTIVES In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third "undecided" category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies. MATERIAL AND METHODS We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer's disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington. RESULTS Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data. DISCUSSION Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency. CONCLUSION The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.
Collapse
Affiliation(s)
- Xinyao Jian
- The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA
| | - Dazheng Zhang
- The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA
| | - Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Jiang Bian
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA; Regenstreif Institute, Indianapolis, Indiana, IN, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiayi Tong
- The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Yong Chen
- The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA; The Graduate Group in Applied Mathematics and Computational Science, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA; Leonard Davis Institute of Health Economics, Philadelphia, PA, USA; Penn Medicine Center for Evidence-based Practice (CEP), Philadelphia, PA, USA; Penn Institute for Biomedical Informatics (IBI), Philadelphia, PA, USA.
| |
Collapse
|
2
|
Liu S, Liu Y, Li M, Shang S, Cao Y, Shen X, Huang C. Artificial intelligence in autoimmune diseases: a bibliometric exploration of the past two decades. Front Immunol 2025; 16:1525462. [PMID: 40330462 PMCID: PMC12052778 DOI: 10.3389/fimmu.2025.1525462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Accepted: 03/27/2025] [Indexed: 05/08/2025] Open
Abstract
Objective Autoimmune diseases have long been recognized for their intricate nature and elusive mechanisms, presenting significant challenges in both diagnosis and treatment. The advent of artificial intelligence technology has opened up new possibilities for understanding, diagnosing, predicting, and managing autoimmune disorders. This study aims to explore the current state and emerging trends in the field through bibliometric analysis, providing guidance for future research directions. Methods The study employed the Web of Science Core Collection database for data acquisition and performed bibliometric analysis using CiteSpace, HistCite Pro, and VOSviewer. Results Over the past two decades, 1,695 publications emerged in this research field, including 1,409 research articles and 286 reviews. This investigation unveils the global development landscape predominantly led by the United States and China. The research identifies key institutions, such as Brigham & Women's Hospital, influential journals like the Annals of the Rheumatic Diseases, distinguished authors including Katherine P. Liao, and pivotal articles. It visually maps out the research clusters' evolutionary path over time and explores their applications in patient identification, risk factors, prognosis assessment, diagnosis, classification of disease subtypes, monitoring and decision support, and drug discovery. Conclusion AI is increasingly recognized for its potential in the field of autoimmune diseases, yet it continues to face numerous challenges, including insufficient model validation and difficulties in data integration and computational power. Significant advancements have been demanded to enhance diagnostic precision, improve treatment methodologies, and establish robust frameworks for data protection, thereby facilitating more effective management of these complex conditions.
Collapse
Affiliation(s)
- Sidi Liu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine of Institute of Health and Medicine (IHM), The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Yang Liu
- Department of Orthopedics, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Ming Li
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine of Institute of Health and Medicine (IHM), The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Shuangshuang Shang
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine of Institute of Health and Medicine (IHM), The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Yunxiang Cao
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine of Institute of Health and Medicine (IHM), The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Xi Shen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine of Institute of Health and Medicine (IHM), The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Chuanbing Huang
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine of Institute of Health and Medicine (IHM), The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| |
Collapse
|
3
|
Morley TJ, Willimitis D, Ripperger M, Lee H, Zhou Y, Han L, Kang J, Meyerson WU, Smoller JW, Choi KW, Walsh CG, Ruderfer DM. Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models. Genet Med 2025; 27:101353. [PMID: 39733260 DOI: 10.1016/j.gim.2024.101353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 12/18/2024] [Accepted: 12/18/2024] [Indexed: 12/30/2024] Open
Abstract
PURPOSE The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records with genetic data to understand which decisions may affect performance. METHODS Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from 2 large independent health systems, and polygenic risk scores (PRS) were generated across all patients of European ancestry with genetic data in the corresponding biobanks. Crohn's disease was studied based on its substantial genetic component, established electronic health records-based definition, and sufficient prevalence for training and testing. We investigated the impact of choices regarding the PRS integration method, training sample, model complexity, and performance metrics. RESULTS Overall, our results showed that including PRS resulted in higher performance, but this gain was only robust in situations with limited clinical information. We found consistent performance increases from more compute-intensive models, such as random forest, but the impact of other decisions varied by site. CONCLUSION This work highlights the importance of considering methodological decision points in interpreting the impact of PRS on prediction performance in clinical models.
Collapse
Affiliation(s)
- Theodore J Morley
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
| | - Drew Willimitis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Michael Ripperger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Hyunjoon Lee
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA; Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Yu Zhou
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA; Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Lide Han
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
| | - Jooeun Kang
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
| | - William U Meyerson
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA; Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA; Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| | - Karmel W Choi
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA; Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Colin G Walsh
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN.
| | - Douglas M Ruderfer
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN.
| |
Collapse
|
4
|
Park J, Levin MG, Zhang D, Reza N, Mead JO, Carruth ED, Kelly MA, Winters A, Kripke CM, Judy RL, Damrauer SM, Owens AT, Bastarache L, Verma A, Kinnamon DD, Hershberger RE, Ritchie MD, Rader DJ. Bidirectional Risk Modulator and Modifier Variant of Dilated and Hypertrophic Cardiomyopathy in BAG3. JAMA Cardiol 2024; 9:1124-1133. [PMID: 39535783 PMCID: PMC11561727 DOI: 10.1001/jamacardio.2024.3547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/23/2024] [Indexed: 11/16/2024]
Abstract
Importance The genetic factors that modulate the reduced penetrance and variable expressivity of heritable dilated cardiomyopathy (DCM) are largely unknown. BAG3 genetic variants have been implicated in both DCM and hypertrophic cardiomyopathy (HCM), nominating BAG3 as a gene that harbors potential modifier variants in DCM. Objective To interrogate the clinical traits and diseases associated with BAG3 coding variation. Design, Setting, and Participants This was a cross-sectional study in the Penn Medicine BioBank (PMBB) enrolling patients of the University of Pennsylvania Health System's clinical practice sites from 2014 to 2023. Whole-exome sequencing (WES) was linked to electronic health record (EHR) data to associate BAG3 coding variants with EHR phenotypes. This was a health care population-based study including individuals of European and African genetic ancestry in the PMBB with WES linked to EHR phenotypes, with replication studies in BioVU, UK Biobank, MyCode, and DCM Precision Medicine Study. Exposures Carrier status for BAG3 coding variants. Main Outcomes and Measures Association of BAG3 coding variation with clinical diagnoses, echocardiographic traits, and longitudinal outcomes. Results In PMBB (n = 43 731; median [IQR] age, 65 [50-76] years; 21 907 female [50.1%]), among 30 324 European and 11 198 African individuals, the common C151R variant was associated with decreased risk for DCM (odds ratio [OR], 0.85; 95% CI, 0.78-0.92) and simultaneous increased risk for HCM (OR, 1.59; 95% CI, 1.25-2.02), which was confirmed in the replication cohorts. C151R carriers exhibited improved longitudinal outcomes compared with noncarriers as assessed by age at death (hazard ratio [HR], 0.85; 95% CI, 0.74-0.96; median [IQR] age, 71.8 [63.1-80.7] in carriers and 70.3 [61.6-79.2] in noncarriers) and heart transplant (HR, 0.81; 95% CI, 0.66-0.99; median [IQR] age, 56.7 [46.1-63.1] in carriers and 55.6 [45.2-62.9] in noncarriers). C151R was associated with reduced risk of DCM (OR, 0.42; 95% CI, 0.24-0.74) and heart failure (OR, 0.27; 95% CI, 0.14-0.50) among individuals harboring truncating TTN variants in exons with high cardiac expression (n = 358). Conclusions and Relevance BAG3 C151R was identified as a bidirectional modulator of risk along the DCM-HCM spectrum, as well as an important genetic modifier variant in TTN-mediated DCM. This work expands on the understanding of the etiology and penetrance of DCM, suggesting that BAG3 C151R is an important genetic modifier variant contributing to the variable expressivity of DCM, warranting further exploration of its mechanisms and of genetic modifiers in DCM more broadly.
Collapse
Affiliation(s)
- Joseph Park
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Department of Medicine, Weill Cornell Medicine, NewYork-Presbyterian Hospital, New York
| | - Michael G. Levin
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - David Zhang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Nosheen Reza
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Jonathan O. Mead
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University, Columbus
| | - Eric D. Carruth
- Department of Genomic Health, Geisinger, Danville, Pennsylvania
| | | | - Alex Winters
- Autism and Developmental Medicine Institute, Geisinger, Danville, Pennsylvania
| | - Colleen M. Kripke
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Renae L. Judy
- Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania
| | - Scott M. Damrauer
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Anjali T. Owens
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Anurag Verma
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Daniel D. Kinnamon
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University, Columbus
| | - Ray E. Hershberger
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University, Columbus
- Division of Cardiovascular Medicine, Department of Internal Medicine, and the Dorothy M. Davis Heart and Lung Research Institute, The Ohio State University, Columbus
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Daniel J. Rader
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| |
Collapse
|
5
|
Kerchberger VE, McNeil JB, Zheng N, Chang D, Rosenberger C, Rogers AJ, Bastarache JA, Feng Q, Wei WQ, Ware LB. Electronic health record biobank cohort recapitulates an association between the MUC5B promoter polymorphism and ARDS in critically ill adults. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.26.24312498. [PMID: 39252926 PMCID: PMC11383515 DOI: 10.1101/2024.08.26.24312498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Background Large population-based DNA biobanks linked to electronic health records (EHRs) may provide novel opportunities to identify genetic drivers of ARDS. Research Question Can we develop an EHR-based algorithm to identify ARDS in a biobank database, and can this validate a previously reported ARDS genetic risk factor? Study Design and Methods We analyzed two parallel genotyped cohorts: a prospective biomarker cohort of critically ill adults (VALID), and a retrospective cohort of hospitalized participants enrolled in a de-identified EHR biobank (BioVU). ARDS was identified by clinician-investigator review in VALID and an EHR algorithm in BioVU (EHR-ARDS). We tested the association between the MUC5B promoter polymorphism rs35705950 with development of ARDS, and assessed if age modified this genetic association in each cohort. Results In VALID, 2,795 patients were included, age was 55 [43, 66] (median [IQR]) years, and 718 (25.7%) developed ARDS. In BioVU, 9,025 hospitalized participants were included, age was 60 [48, 70] years, and 1,056 (11.7%) developed EHR-ARDS. We observed a significant age-related interaction effect on ARDS in VALID: among older patients, rs35705950 was associated with increased ARDS risk (OR: 1.44; 95%CI 1.08-1.92; p=0.012) whereas among younger patients this effect was absent (OR: 0.84; 95%CI: 0.62-1.14; p=0.26). In BioVU, rs35705950 was associated with increased risk for EHR-ARDS among all participants (OR: 1.20; 95%CI: 1.00-1.43, p=0.043) and this did not vary by age. The polymorphism was also associated worse oxygenation in mechanically ventilated BioVU participants, but had no association with oxygenation in VALID. Interpretation The MUC5B promoter polymorphism was associated with ARDS in two cohorts of at-risk adults. Although age-related effect modification was observed only in VALID, BioVU identified a consistent association between MUC5B and ARDS risk regardless of age, and a novel association with oxygenation impairment. Our study highlights the potential for EHR biobanks to enable precision-medicine ARDS studies.
Collapse
|
6
|
Tan MCB, Isom CA, Liu Y, Trégouët DA, Wu L, Zhou D, Gamazon ER. Transcriptome-wide association study and Mendelian randomization in pancreatic cancer identifies susceptibility genes and causal relationships with type 2 diabetes and venous thromboembolism. EBioMedicine 2024; 106:105233. [PMID: 39002386 PMCID: PMC11284564 DOI: 10.1016/j.ebiom.2024.105233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/19/2024] [Accepted: 06/25/2024] [Indexed: 07/15/2024] Open
Abstract
BACKGROUND Two important questions regarding the genetics of pancreatic adenocarcinoma (PDAC) are 1. Which germline genetic variants influence the incidence of this cancer; and 2. Whether PDAC causally predisposes to associated non-malignant phenotypes, such as type 2 diabetes (T2D) and venous thromboembolism (VTE). METHODS In this study of 8803 patients with PDAC and 67,523 controls, we first performed a large-scale transcriptome-wide association study to investigate the association between genetically determined gene expression in normal pancreas tissue and PDAC risk. Secondly, we used Mendelian Randomization (MR) to analyse the causal relationships among PDAC, T2D (74,124 cases and 824,006 controls) and VTE (30,234 cases and 172,122 controls). FINDINGS Sixteen genes showed an association with PDAC risk (FDR <0.10), including six genes not yet reported for PDAC risk (PPIP5K2, TFR2, HNF4G, LRRC10B, PRC1 and FBXL20) and ten previously reported genes (INHBA, SMC2, ABO, PDX1, MTMR6, ACOT2, PGAP3, STARD3, GSDMB, ADAM33). MR provided support for a causal effect of PDAC on T2D using genetic instruments in the HNF4G and PDX1 loci, and unidirectional causality of VTE on PDAC involving the ABO locus (OR 2.12, P < 1e-7). No evidence of a causal effect of PDAC on VTE was found. INTERPRETATION These analyses identified candidate susceptibility genes and disease relationships for PDAC that warrant further investigation. HNF4G and PDX1 may induce PDAC-associated diabetes, whereas ABO may induce the causative effect of VTE on PDAC. FUNDING National Institutes of Health (USA).
Collapse
Affiliation(s)
- Marcus C B Tan
- Division of Surgical Oncology and Endocrine Surgery, Section of Surgical Sciences, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN, USA; Vanderbilt-Ingram Cancer Center, Nashville, TN, USA
| | - Chelsea A Isom
- Herbert Wertheim School of Public Health & Human Longevity Science, University of California, San Diego, San Diego, CA, USA
| | - Yangzi Liu
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawai'i Cancer Center, University of Hawai'i at Mānoa, Honolulu, HI, USA
| | - Dan Zhou
- School of Public Health and the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China; The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, China.
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA; Clare Hall, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
7
|
Li L, Wang C, Ye Z, Van Spall HGC, Zhang J, Lip GYH, Li G. Association Between Remnant Cholesterol and Risk of Incident Atrial Fibrillation: Population-Based Evidence From a Large-Scale Prospective Cohort Study. J Am Heart Assoc 2024; 13:e033840. [PMID: 38761084 PMCID: PMC11179806 DOI: 10.1161/jaha.123.033840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/15/2024] [Indexed: 05/20/2024]
Abstract
BACKGROUND Evidence for the relationship between remnant cholesterol (RC) and incident atrial fibrillation (AF) risk remains sparse and limited. METHODS AND RESULTS Participants were enrolled between 2006 and 2010 and followed up to 2021. The multivariable Cox proportional hazards model was used to examine the relationship between RC quartiles and risk of incident AF. Subgroup analyses and sensitivity analyses were performed to explore the potential modification of the association and the robustness of the main findings. A total of 422 316 participants (mean age, 56 years; 54% women) were included for analyses. During a median follow-up of 11.9 years (first quartile-third quartile, 11.6-13.2 years), there were 24 774 AF events documented with an incidence of 4.92 events per 1000 person-years (95% CI, 4.86-4.98). Participants in higher RC quartiles had a lower risk of incident AF than those in the lowest quartile (first quartile): hazard ratio (HR)=0.96 (95% CI, 0.91-1.00) for second quartile; HR=0.92 (95% CI, 0.88-0.96) for third quartile; and HR=0.85 (95% CI, 0.81-0.89) for fourth quartile (P for trend <0.001). The association between RC quartiles and risk of incident AF was stronger in participants aged ≥65 years, in men, and in participants without history of diabetes when compared with control groups (P<0.001 for interaction). CONCLUSIONS On the basis of data from this large-scale prospective cohort study, elevated RC was associated with a lower risk of incident AF.
Collapse
Affiliation(s)
- Likang Li
- Center for Clinical Epidemiology and Methodology Guangdong Second Provincial General Hospital Guangzhou China
| | - Chuangshi Wang
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital Chinese Academy of Medical Sciences and Peking Union Medical College Beijing China
| | - Zebing Ye
- Department of Cardiology Guangdong Second Provincial General Hospital Guangzhou China
| | - Harriette G C Van Spall
- Department of Medicine McMaster University Hamilton ON Canada
- Population Health Research Institute, McMaster University Hamilton ON Canada
| | - Jingyi Zhang
- Center for Clinical Epidemiology and Methodology Guangdong Second Provincial General Hospital Guangzhou China
| | - Gregory Y H Lip
- Liverpool Centre for Cardiovascular Science at University of Liverpool Liverpool John Moores University and Liverpool Heart and Chest Hospital Liverpool UK
- Danish Center for Health Services Research, Department of Clinical Medicine Aalborg University Aalborg Denmark
| | - Guowei Li
- Center for Clinical Epidemiology and Methodology Guangdong Second Provincial General Hospital Guangzhou China
- Father Sean O'Sullivan Research Centre St. Joseph's Healthcare Hamilton Hamilton ON Canada
| |
Collapse
|
8
|
Kresge HA, Blostein F, Goleva S, Albiñana C, Revez JA, Wray NR, Vilhjálmsson BJ, Zhu Z, McGrath JJ, Davis LK. Phenomewide Association Study of Health Outcomes Associated With the Genetic Correlates of 25 Hydroxyvitamin D Concentration and Vitamin D Binding Protein Concentration. Twin Res Hum Genet 2024; 27:69-79. [PMID: 38644690 PMCID: PMC11138239 DOI: 10.1017/thg.2024.19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
While it is known that vitamin D deficiency is associated with adverse bone outcomes, it remains unclear whether low vitamin D status may increase the risk of a wider range of health outcomes. We had the opportunity to explore the association between common genetic variants associated with both 25 hydroxyvitamin D (25OHD) and the vitamin D binding protein (DBP, encoded by the GC gene) with a comprehensive range of health disorders and laboratory tests in a large academic medical center. We used summary statistics for 25OHD and DBP to generate polygenic scores (PGS) for 66,482 participants with primarily European ancestry and 13,285 participants with primarily African ancestry from the Vanderbilt University Medical Center Biobank (BioVU). We examined the predictive properties of PGS25OHD, and two scores related to DBP concentration with respect to 1322 health-related phenotypes and 315 laboratory-measured phenotypes from electronic health records. In those with European ancestry: (a) the PGS25OHD and PGSDBP scores, and individual SNPs rs4588 and rs7041 were associated with both 25OHD concentration and 1,25 dihydroxyvitamin D concentrations; (b) higher PGS25OHD was associated with decreased concentrations of triglycerides and cholesterol, and reduced risks of vitamin D deficiency, disorders of lipid metabolism, and diabetes. In general, the findings for the African ancestry group were consistent with findings from the European ancestry analyses. Our study confirms the utility of PGS and two key variants within the GC gene (rs4588 and rs7041) to predict the risk of vitamin D deficiency in clinical settings and highlights the shared biology between vitamin D-related genetic pathways a range of health outcomes.
Collapse
Affiliation(s)
- Hailey A. Kresge
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Freida Blostein
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Slavina Goleva
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Clara Albiñana
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Joana A. Revez
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Naomi R. Wray
- Department of Psychiatry, University of Oxford, Oxford, UK
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Zhihong Zhu
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
| | - John J. McGrath
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Neurology, Pharmacology and Special Education, Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
9
|
Morley TJ, Willimitis D, Ripperger M, Lee H, Han L, Zhou Y, Kang J, Davis LK, Smoller JW, Choi KW, Walsh CG, Ruderfer DM. Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.01.23297927. [PMID: 37961557 PMCID: PMC10635256 DOI: 10.1101/2023.11.01.23297927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) and genetic data to understand which decision points may affect performance. Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients with genetic data in the corresponding biobanks. Crohn's disease was used as the model phenotype based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for model training and testing. We investigated the impact of PRS integration method, as well as choices regarding training sample, model complexity, and performance metrics. Overall, our results show that including PRS resulted in higher performance by some metrics but the gain in performance was only robust when combined with demographic data alone. Improvements were inconsistent or negligible after including additional clinical information. The impact of genetic information on performance also varied by PRS integration method, with a small improvement in some cases from combining PRS with the output of a clinical model (late-fusion) compared to its inclusion an additional feature (early-fusion). The effects of other modeling decisions varied between institutions though performance increased with more compute-intensive models such as random forest. This work highlights the importance of considering methodological decision points in interpreting the impact on prediction performance when including PRS information in clinical models.
Collapse
Affiliation(s)
- Theodore J. Morley
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
| | - Drew Willimitis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Michael Ripperger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Hyunjoon Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Lide Han
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
| | - Yu Zhou
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Jooeun Kang
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Jordan W. Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| | - Karmel W. Choi
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Colin G. Walsh
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Douglas M. Ruderfer
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
10
|
Robinson-Cohen C, Triozzi JL, Rowan B, He J, Chen HC, Zheng NS, Wei WQ, Wilson OD, Hellwege JN, Tsao PS, Gaziano JM, Bick A, Matheny ME, Chung CP, Lipworth L, Siew ED, Ikizler TA, Tao R, Hung AM. Genome-Wide Association Study of CKD Progression. J Am Soc Nephrol 2023; 34:1547-1559. [PMID: 37261792 PMCID: PMC10482057 DOI: 10.1681/asn.0000000000000170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 05/25/2023] [Indexed: 06/02/2023] Open
Abstract
SIGNIFICANCE STATEMENT Rapid progression of CKD is associated with poor clinical outcomes. Most previous studies looking for genetic factors associated with low eGFR have used cross-sectional data. The authors conducted a meta-analysis of genome-wide association studies of eGFR decline among 116,870 participants with CKD, focusing on longitudinal data. They identified three loci (two of them novel) associated with longitudinal eGFR decline. In addition to the known UMOD/PDILT locus, variants within BICC1 were associated with significant differences in longitudinal eGFR slope. Variants within HEATR4 also were associated with differences in eGFR decline, but only among Black/African American individuals without diabetes. These findings help characterize molecular mechanisms of eGFR decline in CKD and may inform new therapeutic approaches for progressive kidney disease. BACKGROUND Rapid progression of CKD is associated with poor clinical outcomes. Despite extensive study of the genetics of cross-sectional eGFR, only a few loci associated with eGFR decline over time have been identified. METHODS We performed a meta-analysis of genome-wide association studies of eGFR decline among 116,870 participants with CKD-defined by two outpatient eGFR measurements of <60 ml/min per 1.73 m 2 , obtained 90-365 days apart-from the Million Veteran Program and Vanderbilt University Medical Center's DNA biobank. The primary outcome was the annualized relative slope in outpatient eGFR. Analyses were stratified by ethnicity and diabetes status and meta-analyzed thereafter. RESULTS In cross-ancestry meta-analysis, the strongest association was rs77924615, near UMOD / PDILT ; each copy of the G allele was associated with a 0.30%/yr faster eGFR decline ( P = 4.9×10 -27 ). We also observed an association within BICC1 (rs11592748), where every additional minor allele was associated with a 0.13%/yr slower eGFR decline ( P = 5.6×10 -9 ). Among participants without diabetes, the strongest association was the UMOD/PDILT variant rs36060036, associated with a 0.27%/yr faster eGFR decline per copy of the C allele ( P = 1.9×10 -17 ). Among Black participants, a significantly faster eGFR decline was associated with variant rs16996674 near APOL1 (R 2 =0.29 with the G1 high-risk genotype); among Black participants with diabetes, lead variant rs11624911 near HEATR4 also was associated with a significantly faster eGFR decline. We also nominally replicated loci with known associations with eGFR decline, near PRKAG2, FGF5, and C15ORF54. CONCLUSIONS Three loci were significantly associated with longitudinal eGFR change at genome-wide significance. These findings help characterize molecular mechanisms of eGFR decline and may contribute to the development of new therapeutic approaches for progressive CKD.
Collapse
Affiliation(s)
- Cassianne Robinson-Cohen
- Division of Nephrology and Hypertension, Vanderbilt Center for Kidney Disease, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jefferson L Triozzi
- Division of Nephrology and Hypertension, Vanderbilt Center for Kidney Disease, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Bryce Rowan
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jing He
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Hua C Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Neil S Zheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Otis D Wilson
- Division of Nephrology and Hypertension, Vanderbilt Center for Kidney Disease, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- VA Tennessee Valley Healthcare System, Clinical Sciences Research and Development, Nashville, Tennessee
| | - Jacklyn N Hellwege
- VA Tennessee Valley Healthcare System, Clinical Sciences Research and Development, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Philip S Tsao
- Department of Medicine, Division of Cardiovascular Medicine, VA Palo Alto Health Care System, Palo Alto, California
- Department of Medicine, Stanford University School of Medicine, Stanford, California
| | - J Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham and Women's Hospital and Harvard School of Medicine, Boston, Massachusetts
| | - Alexander Bick
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Michael E Matheny
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
- Geriatrics Research Education and Clinical Care Service, VA Tennessee Valley Healthcare System, Nashville, Tennessee
| | - Cecilia P Chung
- VA Tennessee Valley Healthcare System, Clinical Sciences Research and Development, Nashville, Tennessee
- Division of Rheumatology and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Loren Lipworth
- Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Edward D Siew
- Division of Nephrology and Hypertension, Vanderbilt Center for Kidney Disease, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - T Alp Ikizler
- Division of Nephrology and Hypertension, Vanderbilt Center for Kidney Disease, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Adriana M Hung
- Division of Nephrology and Hypertension, Vanderbilt Center for Kidney Disease, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- VA Tennessee Valley Healthcare System, Clinical Sciences Research and Development, Nashville, Tennessee
| |
Collapse
|
11
|
La Cava WG, Lee PC, Ajmal I, Ding X, Solanki P, Cohen JB, Moore JH, Herman DS. A flexible symbolic regression method for constructing interpretable clinical prediction models. NPJ Digit Med 2023; 6:107. [PMID: 37277550 PMCID: PMC10241925 DOI: 10.1038/s41746-023-00833-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 05/05/2023] [Indexed: 06/07/2023] Open
Abstract
Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10-6) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT's models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10-6). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.
Collapse
Affiliation(s)
- William G La Cava
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Paul C Lee
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Imran Ajmal
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Xiruo Ding
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Priyanka Solanki
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jordana B Cohen
- Division of Renal-Electrolyte and Hypertension, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel S Herman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
12
|
Schirle L, Samuels DC, Faucon A, Cox NJ, Bruehl S. Polygenic Contributions to Chronic Overlapping Pain Conditions in a Large Electronic Health Record Sample. THE JOURNAL OF PAIN 2023; 24:1056-1068. [PMID: 36736868 PMCID: PMC10257768 DOI: 10.1016/j.jpain.2023.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 01/11/2023] [Accepted: 01/23/2023] [Indexed: 02/05/2023]
Abstract
Chronic overlapping pain conditions (COPCs) are believed to share common etiological mechanisms involving central sensitization. Genetic and environmental factors putatively combine to influence susceptibility to central sensitization and COPCs. This study employed a genome-wide polygenic risk score approach to evaluate genetic influences on 8 common COPCs. COPCs were identified by International Classification of Disease codes in Vanderbilt's deidentified clinical biorepository (BioVU), with each COPC condition empirically weighted for the level of central sensitization based on prior work. A centralized pain score (CPS) was calculated for 55,340 individuals by summing the weighted number of COPCs. Overall, 12,502 individuals (22.6%) were diagnosed with at least 1 COPC, with females exhibiting nearly twice the mean CPS as males. To assess the genetic influence on centralized pain in COPCs, 6 pain polygenic risk scores (PRSs) were developed using UK Biobank data to predict 6 pain criteria (no pain, neck/shoulder, abdomen, hip, knee, low back pain). These PRSs were then deployed in the BioVU cohort to test for association with CPS. In regression models adjusted for age, sex, and BMI, all pain PRSs except hip pain were significantly associated with CPS. Our findings support a shared polygenic influence across COPCs potentially involving central sensitization mechanisms. PERSPECTIVE: This study used a polygenic risk score approach to investigate genetic influences on chronic overlapping pain conditions. Significant findings in this study provide evidence supporting previous hypotheses that a shared polygenic influence involving central sensitization may underly chronic overlapping pain conditions and can guide future biomarker and risk assessment research.
Collapse
Affiliation(s)
- Lori Schirle
- Vanderbilt University School of Nursing, Nashville, Tennessee.
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, Tennessee; Vanderbilt Genetics Institute, Nashville, Tennessee
| | | | - Nancy J Cox
- Vanderbilt Genetics Institute, Nashville, Tennessee; Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Stephen Bruehl
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
13
|
Adi M, Ghanbari F, Downie ML, Hung A, Robinson-Cohen C, Manousaki D. Effects of 25-Hydroxyvitamin D Levels on Renal Function: A Bidirectional Mendelian Randomization Study. J Clin Endocrinol Metab 2023; 108:1442-1451. [PMID: 36510827 PMCID: PMC10413421 DOI: 10.1210/clinem/dgac724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022]
Abstract
CONTEXT Observational studies investigating the role of vitamin D in renal function have yielded inconsistent results. OBJECTIVE We tested whether 25-hydroxyvitamin D (25[OH]D) serum levels are associated with renal function, and inversely, whether altered renal function causes changes in 25(OH)D, using Mendelian randomization (MR). METHODS In this two-sample MR study, we used single nucleotide polymorphisms (SNP) associated with 25(OH)D in 443 734 Europeans and evaluated their effects on estimated glomerular filtration rate (eGFR), blood urea nitrogen (BUN), chronic kidney disease (CKD) risk and progression in genome-wide association studies totaling over 1 million Europeans. To control for pleiotropy, we also used SNPs solely in DHCR7, CYP2R1, and GC, all genes with known roles in vitamin D metabolism. We performed a reverse MR, using SNPs for the above indices of renal function to study causal effects on 25(OH)D levels. RESULTS We did not find robust evidence supporting effects of 25(OH)D on eGFR, BUN, and CKD or its progression. Our inverse variance weighted MR demonstrated a 0.56 decrease in standardized log-transformed 25(OH)D (95% CI -0.73, -0.41; P = 2.89 × 10-12) per unit increase in log-transformed eGFR. Increased BUN was associated with increased 25(OH)D (β = 0.25, 95% CI 0.15, 0.36; P = 4.12 × 10-6 per unit increase in log-transformed BUN). Finally, genetically predicted CKD conferred a 0.05 increase in standardized log-transformed 25(OH)D level (95% CI 0.04, 0.06; P = 1.06 × 10-13). Other MR methods confirmed the findings of the main analyses. CONCLUSION Genetically predicted CKD, increased BUN, and decreased eGFR are associated with increased 25(OH)D levels, but we found no causal effect of 25(OH)D on renal function in Europeans.
Collapse
Affiliation(s)
- Manel Adi
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, QC H3T1J4, Canada
| | - Faegheh Ghanbari
- Research Center of the Sainte-Justine University Hospital, University of Montreal, Montreal, QC H3TAC5, Canada
| | - Mallory L Downie
- Department of Renal Medicine, University College London, London NW32PF, UK
| | - Adriana Hung
- Department of Medicine, Vanderbilt University Medical Center, Veterans Administration Tennessee Valley Healthcare System, Nashville, TN 37212, USA
| | | | - Despoina Manousaki
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, QC H3T1J4, Canada
- Research Center of the Sainte-Justine University Hospital, University of Montreal, Montreal, QC H3TAC5, Canada
- Department of Pediatrics, University of Montreal, Montreal, QC H3T1C5, Canada
| |
Collapse
|
14
|
Sochacki AL, Bejan CA, Zhao S, Patel A, Kishtagari A, Spaulding TP, Silver AJ, Stockton SS, Pugh K, Dorand RD, Bhatta M, Strayer N, Zhang S, Snider CA, Stricker T, Nazha A, Bick AG, Xu Y, Savona MR. Patient-specific comorbidities as prognostic variables for survival in myelofibrosis. Blood Adv 2023; 7:756-767. [PMID: 35420683 PMCID: PMC9989522 DOI: 10.1182/bloodadvances.2021006318] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 02/23/2022] [Accepted: 03/19/2022] [Indexed: 11/20/2022] Open
Abstract
Treatment decisions in primary myelofibrosis (PMF) are guided by numerous prognostic systems. Patient-specific comorbidities have influence on treatment-related survival and are considered in clinical contexts but have not been routinely incorporated into current prognostic models. We hypothesized that patient-specific comorbidities would inform prognosis and could be incorporated into a quantitative score. All patients with PMF or secondary myelofibrosis with available DNA and comprehensive electronic health record (EHR) data treated at Vanderbilt University Medical Center between 1995 and 2016 were identified within Vanderbilt's Synthetic Derivative and BioVU Biobank. We recapitulated established PMF risk scores (eg, Dynamic International Prognostic Scoring System [DIPSS], DIPSS plus, Genetics-Based Prognostic Scoring System, Mutation-Enhanced International Prognostic Scoring System 70+) and comorbidities through EHR chart extraction and next-generation sequencing on biobanked peripheral blood DNA. The impact of comorbidities was assessed via DIPSS-adjusted overall survival using Bonferroni correction. Comorbidities associated with inferior survival include renal failure/dysfunction (hazard ratio [HR], 4.3; 95% confidence interval [95% CI], 2.1-8.9; P = .0001), intracranial hemorrhage (HR, 28.7; 95% CI, 7.0-116.8; P = 2.83e-06), invasive fungal infection (HR, 41.2; 95% CI, 7.2-235.2; P = 2.90e-05), and chronic encephalopathy (HR, 15.1; 95% CI, 3.8-59.4; P = .0001). The extended DIPSS model including all 4 significant comorbidities showed a significantly higher discriminating power (C-index 0.81; 95% CI, 0.78-0.84) than the original DIPSS model (C-index 0.73; 95% CI, 0.70-0.77). In summary, we repurposed an institutional biobank to identify and risk-classify an uncommon hematologic malignancy by established (eg, DIPSS) and other clinical and pathologic factors (eg, comorbidities) in an unbiased fashion. The inclusion of comorbidities into risk evaluation may augment prognostic capability of future genetics-based scoring systems.
Collapse
Affiliation(s)
- Andrew L. Sochacki
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Cosmin Adrian Bejan
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN
| | - Shilin Zhao
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN
| | - Ameet Patel
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Ashwin Kishtagari
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Travis P. Spaulding
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Alexander J. Silver
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
- Program in Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN
| | - Shannon S. Stockton
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Kelly Pugh
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - R. Dixon Dorand
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Manasa Bhatta
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Nicholas Strayer
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN
| | - Siwei Zhang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN
| | - Christina A. Snider
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
| | - Thomas Stricker
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN
| | - Aziz Nazha
- Leukemia Program, Department of Hematology and Medical Oncology, Cleveland Clinic, Taussig Cancer Center, Cleveland, OH
| | - Alexander G. Bick
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
- Program in Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN
- Center for Immunobiology, Vanderbilt University School of Medicine, Nashville, TN
- Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN
| | - Yaomin Xu
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN
| | - Michael R. Savona
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN
- Program in Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN
- Center for Immunobiology, Vanderbilt University School of Medicine, Nashville, TN
- Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN
| |
Collapse
|
15
|
Stefanski KM, Li GC, Marinko JT, Carter BD, Samuels DC, Sanders CR. How T118M peripheral myelin protein 22 predisposes humans to Charcot-Marie-Tooth disease. J Biol Chem 2023; 299:102839. [PMID: 36581210 PMCID: PMC9860121 DOI: 10.1016/j.jbc.2022.102839] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/19/2022] [Accepted: 12/21/2022] [Indexed: 12/27/2022] Open
Abstract
Data from gnomAD indicate that a missense mutation encoding the T118M variation in human peripheral myelin protein 22 (PMP22) is found in roughly one of every 75 genomes of western European lineage (1:120 in the overall human population). It is unusual among PMP22 variants that cause Charcot-Marie-Tooth (CMT) disease in that it is not 100% penetrant. Here, we conducted cellular and biophysical studies to determine why T118M PMP22 predisposes humans to CMT, but with only incomplete penetrance. We found that T118M PMP22 is prone to mistraffic but differs even from the WT protein in that increased expression levels do not result in a reduction in trafficking efficiency. Moreover, the T118M mutant exhibits a reduced tendency to form large intracellular aggregates relative to other disease mutants and even WT PMP22. NMR spectroscopy revealed that the structure and dynamics of T118M PMP22 resembled those of WT. These results show that the main consequence of T118M PMP22 in WT/T118M heterozygous individuals is a reduction in surface-trafficked PMP22, unaccompanied by formation of toxic intracellular aggregates. This explains the incomplete disease penetrance and the mild neuropathy observed for WT/T118M CMT cases. We also analyzed BioVU, a biobank linked to deidentified electronic medical records, and found a statistically robust association of the T118M mutation with the occurrence of long and/or repeated episodes of carpal tunnel syndrome. Collectively, our results illuminate the cellular effects of the T118M PMP22 variation leading to CMT disease and indicate a second disorder for which it is a risk factor.
Collapse
Affiliation(s)
- Katherine M Stefanski
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee, USA; Center for Structural Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Geoffrey C Li
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee, USA; Center for Structural Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Justin T Marinko
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee, USA; Center for Structural Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Bruce D Carter
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.
| | - Charles R Sanders
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee, USA; Center for Structural Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA; Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.
| |
Collapse
|
16
|
Cisterna A, González-Vidal A, Ruiz D, Ortiz J, Gómez-Pascual A, Chen Z, Nalls M, Faghri F, Hardy J, Díez I, Maietta P, Álvarez S, Ryten M, Botía JA. PhenoExam: gene set analyses through integration of different phenotype databases. BMC Bioinformatics 2022; 23:567. [PMID: 36587217 PMCID: PMC9805686 DOI: 10.1186/s12859-022-05122-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 12/22/2022] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Gene set enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) plays an important role in bioinformatics focused on diseases of genetic basis. To facilitate phenotype-oriented gene set analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. RESULTS PhenoExam generates sensitive and accurate phenotype enrichment analyses. It is also effective in segregating gene sets or Mendelian diseases with very similar phenotypes. We tested the tool with two similar diseases (Parkinson and dystonia), to show phenotype-level similarities but also potentially interesting differences. Moreover, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. CONCLUSIONS We developed PhenoExam, a freely available R package and Web application, which performs phenotype enrichment and disease enrichment analysis on gene set G, measures statistically significant phenotype similarities between pairs of gene sets G and G' and detects statistically significant exclusive phenotypes or disease terms, across different databases. We proved with simulations and real cases that it is useful to distinguish between gene sets or diseases with very similar phenotypes. Github R package URL is https://github.com/alexcis95/PhenoExam . Shiny App URL is https://alejandrocisterna.shinyapps.io/phenoexamweb/ .
Collapse
Affiliation(s)
- Alejandro Cisterna
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Aurora González-Vidal
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Daniel Ruiz
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Jordi Ortiz
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Alicia Gómez-Pascual
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Zhongbo Chen
- Department of Neurodegenerative Disease, UCL, Institute of Neurology, London, UK
| | - Mike Nalls
- Data Tecnica International LLC, Glen Echo, MD, USA
- Laboratory of Neurogenetics, NIA/NIH, Bethesda, MD, USA
- Center for Alzheimer's and Related Dememtias, NIH, Bethesda, MD, USA
| | - Faraz Faghri
- Data Tecnica International LLC, Glen Echo, MD, USA
- Laboratory of Neurogenetics, NIA/NIH, Bethesda, MD, USA
- Center for Alzheimer's and Related Dememtias, NIH, Bethesda, MD, USA
| | - John Hardy
- Department of Neurodegenerative Disease, UCL, Institute of Neurology, London, UK
- Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Irene Díez
- NIMGenetics Genómica y Medicina S.L, Madrid, Spain
| | | | - Sara Álvarez
- NIMGenetics Genómica y Medicina S.L, Madrid, Spain
| | - Mina Ryten
- Department of Neurodegenerative Disease, UCL, Institute of Neurology, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, WC1E 6BT, UK
| | - Juan A Botía
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain.
- Department of Neurodegenerative Disease, UCL, Institute of Neurology, London, UK.
| |
Collapse
|
17
|
Robinson JR, Carroll RJ, Bastarache L, Chen Q, Pirruccello J, Mou Z, Wei WQ, Connolly J, Mentch F, Crane PK, Hebbring SJ, Crosslin DR, Gordon AS, Rosenthal EA, Stanaway IB, Hayes MG, Wei W, Petukhova L, Namjou-Khales B, Zhang G, Safarova MS, Walton NA, Still C, Bottinger EP, Loos RJF, Murphy SN, Jackson GP, Abumrad N, Kullo IJ, Jarvik GP, Larson EB, Weng C, Roden D, Khera AV, Denny JC. Quantifying the phenome-wide disease burden of obesity using electronic health records and genomics. Obesity (Silver Spring) 2022; 30:2477-2488. [PMID: 36372681 PMCID: PMC9691570 DOI: 10.1002/oby.23561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 11/15/2022]
Abstract
OBJECTIVE High BMI is associated with many comorbidities and mortality. This study aimed to elucidate the overall clinical risk of obesity using a genome- and phenome-wide approach. METHODS This study performed a phenome-wide association study of BMI using a clinical cohort of 736,726 adults. This was followed by genetic association studies using two separate cohorts: one consisting of 65,174 adults in the Electronic Medical Records and Genomics (eMERGE) Network and another with 405,432 participants in the UK Biobank. RESULTS Class 3 obesity was associated with 433 phenotypes, representing 59.3% of all billing codes in individuals with severe obesity. A genome-wide polygenic risk score for BMI, accounting for 7.5% of variance in BMI, was associated with 296 clinical diseases, including strong associations with type 2 diabetes, sleep apnea, hypertension, and chronic liver disease. In all three cohorts, 199 phenotypes were associated with class 3 obesity and polygenic risk for obesity, including novel associations such as increased risk of renal failure, venous insufficiency, and gastroesophageal reflux. CONCLUSIONS This combined genomic and phenomic systematic approach demonstrated that obesity has a strong genetic predisposition and is associated with a considerable burden of disease across all disease classes.
Collapse
Affiliation(s)
- Jamie R. Robinson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Robert J. Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Qingxia Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - James Pirruccello
- Center for Genomics Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Zongyang Mou
- Department of Surgery, University of California, San Diego, CA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - John Connolly
- The Center for Applied Genomics, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Frank Mentch
- The Center for Applied Genomics, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Paul K. Crane
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Scott J. Hebbring
- Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, WI, USA
| | - David R. Crosslin
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Adam S. Gordon
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Elisabeth A. Rosenthal
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center, Seattle, WA, USA
| | - Ian B. Stanaway
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - M. Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Wei Wei
- University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Lynn Petukhova
- Department of Epidemiology, Columbia University, New York, NY, USA
| | - Bahram Namjou-Khales
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Ge Zhang
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Mayya S. Safarova
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Nephi A. Walton
- Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA
| | - Christopher Still
- Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA
| | - Erwin P. Bottinger
- The Charles Bronfman Institute for Personalized Medicine at Mount Sinai, The Mindich Child Health and Development Institute, New York, NY, USA
| | - Ruth J. F. Loos
- The Charles Bronfman Institute for Personalized Medicine at Mount Sinai, The Mindich Child Health and Development Institute, New York, NY, USA
| | - Shawn N. Murphy
- Department of Neurology, Partners Healthcare, Boston, MA, USA
| | - Gretchen P. Jackson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Naji Abumrad
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Iftikhar J. Kullo
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Gail P. Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center, Seattle, WA, USA
| | - Eric B. Larson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Dan Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Amit V. Khera
- Center for Genomics Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joshua C. Denny
- All of Us Research Program, National Institutes of Health, Bethesda, MD
| |
Collapse
|
18
|
Liang X, Cao X, Sha Q, Zhang S. HCLC-FC: A novel statistical method for phenome-wide association studies. PLoS One 2022; 17:e0276646. [PMID: 36350801 PMCID: PMC9645610 DOI: 10.1371/journal.pone.0276646] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 10/11/2022] [Indexed: 11/11/2022] Open
Abstract
The emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association studies (PheWAS). In PheWAS, the whole phenome can be divided into numerous phenotypic categories according to the genetic architecture across phenotypes. Currently, statistical analyses for PheWAS are mainly univariate analyses, which test the association between one genetic variant and one phenotype at a time. In this article, we derived a novel and powerful multivariate method for PheWAS. The proposed method involves three steps. In the first step, we apply the bottom-up hierarchical clustering method to partition a large number of phenotypes into disjoint clusters within each phenotypic category. In the second step, the clustering linear combination method is used to combine test statistics within each category based on the phenotypic clusters and obtain p-values from each phenotypic category. In the third step, we propose a new false discovery rate (FDR) control approach. We perform extensive simulation studies to compare the performance of our method with that of other existing methods. The results show that our proposed method controls FDR very well and outperforms other methods we compared with. We also apply the proposed approach to a set of EMR-based phenotypes across more than 300,000 samples from the UK Biobank. We find that the proposed approach not only can well-control FDR at a nominal level but also successfully identify 1,244 significant SNPs that are reported to be associated with some phenotypes in the GWAS catalog. Our open-access tools and instructions on how to implement HCLC-FC are available at https://github.com/XiaoyuLiang/HCLCFC.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Preventive Medicine, Division of Biostatistics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
19
|
Dashti HS, Miranda N, Cade BE, Huang T, Redline S, Karlson EW, Saxena R. Interaction of obesity polygenic score with lifestyle risk factors in an electronic health record biobank. BMC Med 2022; 20:5. [PMID: 35016652 PMCID: PMC8753909 DOI: 10.1186/s12916-021-02198-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic and lifestyle factors have considerable effects on obesity and related diseases, yet their effects in a clinical cohort are unknown. This study in a patient biobank examined associations of a BMI polygenic risk score (PRS), and its interactions with lifestyle risk factors, with clinically measured BMI and clinical phenotypes. METHODS The Mass General Brigham (MGB) Biobank is a hospital-based cohort with electronic health record, genetic, and lifestyle data. A PRS for obesity was generated using 97 genetic variants for BMI. An obesity lifestyle risk index using survey responses to obesogenic lifestyle risk factors (alcohol, education, exercise, sleep, smoking, and shift work) was used to dichotomize the cohort into high and low obesogenic index based on the population median. Height and weight were measured at a clinical visit. Multivariable linear cross-sectional associations of the PRS with BMI and interactions with the obesity lifestyle risk index were conducted. In phenome-wide association analyses (PheWAS), similar logistic models were conducted for 675 disease outcomes derived from billing codes. RESULTS Thirty-three thousand five hundred eleven patients were analyzed (53.1% female; age 60.0 years; BMI 28.3 kg/m2), of which 17,040 completed the lifestyle survey (57.5% female; age: 60.2; BMI: 28.1 (6.2) kg/m2). Each standard deviation increment in the PRS was associated with 0.83 kg/m2 unit increase in BMI (95% confidence interval (CI) =0.76, 0.90). There was an interaction between the obesity PRS and obesity lifestyle risk index on BMI. The difference in BMI between those with a high and low obesogenic index was 3.18 kg/m2 in patients in the highest decile of PRS, whereas that difference was only 1.55 kg/m2 in patients in the lowest decile of PRS. In PheWAS, the obesity PRS was associated with 40 diseases spanning endocrine/metabolic, circulatory, and 8 other disease groups. No interactions were evident between the PRS and the index on disease outcomes. CONCLUSIONS In this hospital-based clinical biobank, obesity risk conferred by common genetic variants was associated with elevated BMI and this risk was attenuated by a healthier patient lifestyle. Continued consideration of the role of lifestyle in the context of genetic predisposition in healthcare settings is necessary to quantify the extent to which modifiable lifestyle risk factors may moderate genetic predisposition and inform clinical action to achieve personalized medicine.
Collapse
Affiliation(s)
- Hassan S Dashti
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. .,Broad Institute, Cambridge, MA, USA. .,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Nicole Miranda
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Brian E Cade
- Broad Institute, Cambridge, MA, USA.,Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Tianyi Huang
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Elizabeth W Karlson
- Mass General Brigham Personalized Medicine, Mass General Brigham HealthCare, Boston, MA, USA.,Department of Medicines, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.,Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Richa Saxena
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.,Broad Institute, Cambridge, MA, USA.,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
20
|
Yang Z, Pou-Prom C, Jones A, Banning M, Dai D, Mamdani M, Oh J, Antoniou T. Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study. JMIR Med Inform 2022; 10:e25157. [PMID: 35019849 PMCID: PMC8792771 DOI: 10.2196/25157] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 04/08/2021] [Accepted: 11/19/2021] [Indexed: 01/16/2023] Open
Abstract
Background The Expanded Disability Status Scale (EDSS) score is a widely used measure to monitor disability progression in people with multiple sclerosis (MS). However, extracting and deriving the EDSS score from unstructured electronic health records can be time-consuming. Objective We aimed to compare rule-based and deep learning natural language processing algorithms for detecting and predicting the total EDSS score and EDSS functional system subscores from the electronic health records of patients with MS. Methods We studied 17,452 electronic health records of 4906 MS patients followed at one of Canada’s largest MS clinics between June 2015 and July 2019. We randomly divided the records into training (80%) and test (20%) data sets, and compared the performance characteristics of 3 natural language processing models. First, we applied a rule-based approach, extracting the EDSS score from sentences containing the keyword “EDSS.” Next, we trained a convolutional neural network (CNN) model to predict the 19 half-step increments of the EDSS score. Finally, we used a combined rule-based–CNN model. For each approach, we determined the accuracy, precision, recall, and F-score compared with the reference standard, which was manually labeled EDSS scores in the clinic database. Results Overall, the combined keyword-CNN model demonstrated the best performance, with accuracy, precision, recall, and an F-score of 0.90, 0.83, 0.83, and 0.83 respectively. Respective figures for the rule-based and CNN models individually were 0.57, 0.91, 0.65, and 0.70, and 0.86, 0.70, 0.70, and 0.70. Because of missing data, the model performance for EDSS subscores was lower than that for the total EDSS score. Performance improved when considering notes with known values of the EDSS subscores. Conclusions A combined keyword-CNN natural language processing model can extract and accurately predict EDSS scores from patient records. This approach can be automated for efficient information extraction in clinical and research settings.
Collapse
Affiliation(s)
- Zhen Yang
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, ON, Canada
| | - Chloé Pou-Prom
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, ON, Canada
| | - Ashley Jones
- Division of Neurology, Department of Medicine, St. Michael's Hospital, Toronto, ON, Canada
| | - Michaelia Banning
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, ON, Canada
| | - David Dai
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, ON, Canada
| | - Muhammad Mamdani
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, ON, Canada.,Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada.,Faculty of Medicine, University of Toronto, Toronto, ON, Canada.,Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada.,Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Jiwon Oh
- Division of Neurology, Department of Medicine, St. Michael's Hospital, Toronto, ON, Canada.,Faculty of Medicine, University of Toronto, Toronto, ON, Canada.,Department of Neurology, Johns Hopkins University, Baltimore, MD, United States
| | - Tony Antoniou
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, ON, Canada.,Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada.,Department of Family and Community Medicine, Unity Health Toronto, Toronto, ON, Canada.,Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
21
|
Liu X, Chubak J, Hubbard RA, Chen Y. SAT: a Surrogate-Assisted Two-wave case boosting sampling method, with application to EHR-based association studies. J Am Med Inform Assoc 2021; 29:918-927. [PMID: 34962283 PMCID: PMC9714591 DOI: 10.1093/jamia/ocab267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 10/16/2021] [Accepted: 11/23/2021] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVES Electronic health records (EHRs) enable investigation of the association between phenotypes and risk factors. However, studies solely relying on potentially error-prone EHR-derived phenotypes (ie, surrogates) are subject to bias. Analyses of low prevalence phenotypes may also suffer from poor efficiency. Existing methods typically focus on one of these issues but seldom address both. This study aims to simultaneously address both issues by developing new sampling methods to select an optimal subsample to collect gold standard phenotypes for improving the accuracy of association estimation. MATERIALS AND METHODS We develop a surrogate-assisted two-wave (SAT) sampling method, where a surrogate-guided sampling (SGS) procedure and a modified optimal subsampling procedure motivated from A-optimality criterion (OSMAC) are employed sequentially, to select a subsample for outcome validation through manual chart review subject to budget constraints. A model is then fitted based on the subsample with the true phenotypes. Simulation studies and an application to an EHR dataset of breast cancer survivors are conducted to demonstrate the effectiveness of SAT. RESULTS We found that the subsample selected with the proposed method contains informative observations that effectively reduce the mean squared error of the resultant estimator of the association. CONCLUSIONS The proposed approach can handle the problem brought by the rarity of cases and misclassification of the surrogate in phenotype-absent EHR-based association studies. With a well-behaved surrogate, SAT successfully boosts the case prevalence in the subsample and improves the efficiency of estimation.
Collapse
Affiliation(s)
- Xiaokang Liu
- Department of Biostatistics, Epidemiology and Informatics, The University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
| | - Jessica Chubak
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA,Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology and Informatics, The University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Corresponding Author: Yong Chen, PhD, Department of Biostatistics, Epidemiology and Informatics, The University of Pennsylvania School of Medicine, 423 Guardian Drive, Philadelphia, PA 19104, USA ()
| |
Collapse
|
22
|
Nelson CA, Bove R, Butte AJ, Baranzini SE. Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis. J Am Med Inform Assoc 2021; 29:424-434. [PMID: 34915552 PMCID: PMC8800523 DOI: 10.1093/jamia/ocab270] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 10/22/2021] [Accepted: 11/26/2021] [Indexed: 11/28/2022] Open
Abstract
OBJECTIVE Early identification of chronic diseases is a pillar of precision medicine as it can lead to improved outcomes, reduction of disease burden, and lower healthcare costs. Predictions of a patient's health trajectory have been improved through the application of machine learning approaches to electronic health records (EHRs). However, these methods have traditionally relied on "black box" algorithms that can process large amounts of data but are unable to incorporate domain knowledge, thus limiting their predictive and explanatory power. Here, we present a method for incorporating domain knowledge into clinical classifications by embedding individual patient data into a biomedical knowledge graph. MATERIALS AND METHODS A modified version of the Page rank algorithm was implemented to embed millions of deidentified EHRs into a biomedical knowledge graph (SPOKE). This resulted in high-dimensional, knowledge-guided patient health signatures (ie, SPOKEsigs) that were subsequently used as features in a random forest environment to classify patients at risk of developing a chronic disease. RESULTS Our model predicted disease status of 5752 subjects 3 years before being diagnosed with multiple sclerosis (MS) (AUC = 0.83). SPOKEsigs outperformed predictions using EHRs alone, and the biological drivers of the classifiers provided insight into the underpinnings of prodromal MS. CONCLUSION Using data from EHR as input, SPOKEsigs describe patients at both the clinical and biological levels. We provide a clinical use case for detecting MS up to 5 years prior to their documented diagnosis in the clinic and illustrate the biological features that distinguish the prodromal MS state.
Collapse
Affiliation(s)
- Charlotte A Nelson
- Integrated Program in Quantitative Biology, University of California San Francisco, San Francisco, California, USA,Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Riley Bove
- Department of Neurology, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA,Department of Pediatrics, University of California San Francisco, San Francisco, California, USA
| | - Sergio E Baranzini
- Corresponding Author: Sergio E. Baranzini, PhD, Department of Neurology, UCSF Weill Institute for Neurosciences, University of California San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94143, USA;
| |
Collapse
|
23
|
Soto-Pedre E, Siddiqui MK, Mordi I, Maroteau C, Soto-Hernaez J, Palmer CN, Pearson ER, Leese GP. Evidence of a Causal Relationship between Serum Thyroid-Stimulating Hormone and Osteoporotic Bone Fractures. Eur Thyroid J 2021; 10:439-446. [PMID: 34950598 PMCID: PMC8647109 DOI: 10.1159/000518058] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 06/02/2021] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE We aimed to validate the association of genome-wide association study (GWAS)-identified loci and polygenic risk score with serum thyroid-stimulating hormone (TSH) concentrations and the diagnosis of hypothyroidism. Then, the causal relationship between serum TSH and osteoporotic bone fracture risk was tested. METHODS A cross-sectional study was done among patients of European Caucasian ethnicity recruited in Tayside (Scotland, UK). Electronic medical records (EMRs) were used to identify patients and average serum TSH concentration and linked to genetic biobank data. Genetic associations were performed by linear and logistic regression models. One-sample Mendelian randomization (MR) was used to test causality of serum TSH on bone fracture risk. RESULTS Replication in 9,452 euthyroid individuals confirmed known loci previously reported. The 58 polymorphisms accounted for 11.08% of the TSH variation (p < 1e-04). TSH-GRS was directly associated with the risk of hypothyroidism with an odds ratio (OR) of 1.98 for the highest quartile compared to the first quartile (p = 2.2e-12). MR analysis of 5,599 individuals showed that compared with those in the lowest tertile of the TSH-GRS, men in the highest tertile had a decreased risk of osteoporotic bone fracture (OR = 0.59, p = 2.4e-03), while no difference in a similar comparison was observed in women (OR = 0.93, p = 0.61). Sensitivity analysis yielded similar results. CONCLUSIONS EMRs linked to genomic data in large populations allow replication of GWAS discoveries without additional genotyping costs. This study suggests that genetically raised serum TSH concentrations are causally associated with decreased bone fracture risk in men.
Collapse
Affiliation(s)
- Enrique Soto-Pedre
- Division of Population Health & Genomics, School of Medicine, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
- *Enrique Soto-Pedre,
| | - Moneeza K. Siddiqui
- Centre for Pharmacogenetics and Pharmacogenomics, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
| | - Ify Mordi
- Division of Molecular and Clinical Medicine, School of Medicine, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
| | - Cyrielle Maroteau
- Centre for Pharmacogenetics and Pharmacogenomics, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
| | | | - Colin N.A. Palmer
- Centre for Pharmacogenetics and Pharmacogenomics, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
| | - Ewan R. Pearson
- Division of Population Health & Genomics, School of Medicine, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
| | - Graham P. Leese
- Division of Population Health & Genomics, School of Medicine, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
- Department of Endocrinology and Diabetes, Ninewells Hospital & Medical School, University of Dundee, Dundee, United Kingdom
| |
Collapse
|
24
|
Yin Z, Tong J, Chen Y, Hubbard RA, Tang CY. A cost-effective chart review sampling design to account for phenotyping error in electronic health records (EHR) data. J Am Med Inform Assoc 2021; 29:52-61. [PMID: 34718618 DOI: 10.1093/jamia/ocab222] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 09/09/2021] [Accepted: 09/28/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Electronic health records (EHR) are commonly used for the identification of novel risk factors for disease, often referred to as an association study. A major challenge to EHR-based association studies is phenotyping error in EHR-derived outcomes. A manual chart review of phenotypes is necessary for unbiased evaluation of risk factor associations. However, this process is time-consuming and expensive. The objective of this paper is to develop an outcome-dependent sampling approach for designing manual chart review, where EHR-derived phenotypes can be used to guide the selection of charts to be reviewed in order to maximize statistical efficiency in the subsequent estimation of risk factor associations. MATERIALS AND METHODS After applying outcome-dependent sampling, an augmented estimator can be constructed by optimally combining the chart-reviewed phenotypes from the selected patients with the error-prone EHR-derived phenotype. We conducted simulation studies to evaluate the proposed method and applied our method to data on colon cancer recurrence in a cohort of patients treated for a primary colon cancer in the Kaiser Permanente Washington (KPW) healthcare system. RESULTS Simulations verify the coverage probability of the proposed method and show that, when disease prevalence is less than 30%, the proposed method has smaller variance than an existing method where the validation set for chart review is uniformly sampled. In addition, from design perspective, the proposed method is able to achieve the same statistical power with 50% fewer charts to be validated than the uniform sampling method, thus, leading to a substantial efficiency gain in chart review. These findings were also confirmed by the application of the competing methods to the KPW colon cancer data. DISCUSSION Our simulation studies and analysis of data from KPW demonstrate that, compared to an existing uniform sampling method, the proposed outcome-dependent method can lead to a more efficient chart review sampling design and unbiased association estimates with higher statistical efficiency. CONCLUSION The proposed method not only optimally combines phenotypes from chart review with EHR-derived phenotypes but also suggests an efficient design for conducting chart review, with the goal of improving the efficiency of estimated risk factor associations using EHR data.
Collapse
Affiliation(s)
- Ziyan Yin
- Department of Statistical Science, Temple University, Philadelphia, Pennsylvania, USA
| | - Jiayi Tong
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Cheng Yong Tang
- Department of Statistical Science, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
25
|
Schirle L, Jeffery A, Yaqoob A, Sanchez-Roige S, Samuels DC. Two data-driven approaches to identifying the spectrum of problematic opioid use: A pilot study within a chronic pain cohort. Int J Med Inform 2021; 156:104621. [PMID: 34673309 PMCID: PMC8609775 DOI: 10.1016/j.ijmedinf.2021.104621] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/29/2021] [Accepted: 10/09/2021] [Indexed: 01/04/2023]
Abstract
BACKGROUND Although electronic health records (EHR) have significant potential for the study of opioid use disorders (OUD), detecting OUD in clinical data is challenging. Models using EHR data to predict OUD often rely on case/control classifications focused on extreme opioid use. There is a need to expand this work to characterize the spectrum of problematic opioid use. METHODS Using a large academic medical center database, we developed 2 data-driven methods of OUD detection: (1) a Comorbidity Score developed from a Phenome-Wide Association Study of phenotypes associated with OUD and (2) a Text-based Score using natural language processing to identify OUD-related concepts in clinical notes. We evaluated the performance of both scores against a manual review with correlation coefficients, Wilcoxon rank sum tests, and area-under the receiver operating characteristic curves. Records with the highest Comorbidity and Text-based scores were re-evaluated by manual review to explore discrepancies. RESULTS Both the Comorbidity and Text-based OUD risk scores were significantly elevated in the patients judged as High Evidence for OUD in the manual review compared to those with No Evidence (p = 1.3E-5 and 1.3E-6, respectively). The risk scores were positively correlated with each other (rho = 0.52, p < 0.001). AUCs for the Comorbidity and Text-based scores were high (0.79 and 0.76, respectively). Follow-up manual review of discrepant findings revealed strengths of data-driven methods over manual review, and opportunities for improvement in risk assessment. CONCLUSION Risk scores comprising comorbidities and text offer differing but synergistic insights into characterizing problematic opioid use. This pilot project establishes a foundation for more robust work in the future.
Collapse
Affiliation(s)
- Lori Schirle
- Vanderbilt University School of Nursing, 461 21st Avenue South, Nashville, TN 37240, USA.
| | - Alvin Jeffery
- Vanderbilt University School of Nursing, 461 21st Avenue South, Nashville, TN 37240, USA; Vanderbilt University, Department of Biomedical Informatics, 2525 West End Ave #1475, Nashville, TN 37203, USA.
| | - Ali Yaqoob
- Vanderbilt University, Department of Biomedical Informatics, 2525 West End Ave #1475, Nashville, TN 37203, USA; Vanderbilt University, Data Science Institute, Sony Building, # 2000, 1400 18th Avenue South, Nashville, TN 37212, USA.
| | - Sandra Sanchez-Roige
- Vanderbilt University Medical Center, Division of Genetic Medicine, Robinson Research Building #536, 220 Pierce Avenue, Nashville, TN 37232, USA; University of California, Department of Psychiatry, 9500 Gilman Dr., LaJolla, CA 92093, USA.
| | - David C Samuels
- Vanderbilt University School of Medicine, Light Hall #507B, Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, 2215 Garland Avenue, Nashville, TN 37232, USA.
| |
Collapse
|
26
|
Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, Narita A, Konuma T, Yamamoto K, Akiyama M, Ishigaki K, Suzuki A, Suzuki K, Obara W, Yamaji K, Takahashi K, Asai S, Takahashi Y, Suzuki T, Shinozaki N, Yamaguchi H, Minami S, Murayama S, Yoshimori K, Nagayama S, Obata D, Higashiyama M, Masumoto A, Koretsune Y, Ito K, Terao C, Yamauchi T, Komuro I, Kadowaki T, Tamiya G, Yamamoto M, Nakamura Y, Kubo M, Murakami Y, Yamamoto K, Kamatani Y, Palotie A, Rivas MA, Daly MJ, Matsuda K, Okada Y. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 2021; 53:1415-1424. [PMID: 34594039 DOI: 10.1038/s41588-021-00931-x] [Citation(s) in RCA: 1094] [Impact Index Per Article: 273.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 08/04/2021] [Indexed: 02/08/2023]
Abstract
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
Collapse
Affiliation(s)
- Saori Sakaue
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan. .,Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. .,Center for Data Sciences, Harvard Medical School, Boston, MA, USA. .,Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| | - Masahiro Kanai
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Yosuke Tanigawa
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
| | - Juha Karjalainen
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Mitja Kurki
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Seizo Koshiba
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Advanced Research Center for Innovations in Next-Generation Medicine (INGEM), Sendai, Japan
| | - Akira Narita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Takahiro Konuma
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Kenichi Yamamoto
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.,Department of Pediatrics, Osaka University Graduate School of Medicine, Suita, Japan.,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Masato Akiyama
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Department of Ocular Pathology and Imaging Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Center for Data Sciences, Harvard Medical School, Boston, MA, USA.,Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Ken Suzuki
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Wataru Obara
- Department of Urology, Iwate Medical University, Iwate, Japan
| | - Ken Yamaji
- Department of Internal Medicine and Rheumatology, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Kazuhisa Takahashi
- Department of Respiratory Medicine, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Satoshi Asai
- Division of Pharmacology, Department of Biomedical Science, Nihon University School of Medicine, Tokyo, Japan.,Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Yasuo Takahashi
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | | | | | | | - Shiro Minami
- Department of Bioregulation, Nippon Medical School, Kawasaki, Japan
| | - Shigeo Murayama
- Tokyo Metropolitan Geriatric Hospital and Institute of Gerontology, Tokyo, Japan
| | - Kozo Yoshimori
- Fukujuji Hospital, Japan Anti-Tuberculosis Association, Tokyo, Japan
| | - Satoshi Nagayama
- The Cancer Institute Hospital of the Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Daisuke Obata
- Center for Clinical Research and Advanced Medicine, Shiga University of Medical Science, Otsu, Japan
| | - Masahiko Higashiyama
- Department of General Thoracic Surgery, Osaka International Cancer Institute, Osaka, Japan
| | | | | | | | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Toshimasa Yamauchi
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Issei Komuro
- Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Takashi Kadowaki
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,Toranomon Hospital, Tokyo, Japan
| | - Gen Tamiya
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Advanced Research Center for Innovations in Next-Generation Medicine (INGEM), Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan.,Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
| | - Masayuki Yamamoto
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Advanced Research Center for Innovations in Next-Generation Medicine (INGEM), Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Yusuke Nakamura
- Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoshinori Murakami
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Aarno Palotie
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.,Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Analytic and Translational Genetics Unit, Department of Medicine, and the Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Manuel A Rivas
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Koichi Matsuda
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan. .,Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. .,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan. .,Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Analytic and Translational Genetics Unit, Department of Medicine, and the Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. .,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan.
| |
Collapse
|
27
|
Linder JE, Bastarache L, Hughey JJ, Peterson JF. The Role of Electronic Health Records in Advancing Genomic Medicine. Annu Rev Genomics Hum Genet 2021; 22:219-238. [PMID: 34038146 PMCID: PMC9297710 DOI: 10.1146/annurev-genom-121120-125204] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recent advances in genomic technology and widespread adoption of electronic health records (EHRs) have accelerated the development of genomic medicine, bringing promising research findings from genome science into clinical practice. Genomic and phenomic data, accrued across large populations through biobanks linked to EHRs, have enabled the study of genetic variation at a phenome-wide scale. Through new quantitative techniques, pleiotropy can be explored with phenome-wide association studies, the occurrence of common complex diseases can be predicted using the cumulative influence of many genetic variants (polygenic risk scores), and undiagnosed Mendelian syndromes can be identified using EHR-based phenotypic signatures (phenotype risk scores). In this review, we trace the role of EHRs from the development of genome-wide analytic techniques to translational efforts to test these new interventions to the clinic. Throughout, we describe the challenges that remain when combining EHRs with genetics to improve clinical care.
Collapse
Affiliation(s)
- Jodell E Linder
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA;
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA; , ,
| | - Jacob J Hughey
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA; , ,
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA; , ,
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA
| |
Collapse
|
28
|
Abstract
Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
| |
Collapse
|
29
|
Moore CR, Jain S, Haas S, Yadav H, Whitsel E, Rosamand W, Heiss G, Kucharska-Newton AM. Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study. BMJ Open 2021; 11:e047356. [PMID: 34127492 PMCID: PMC8204176 DOI: 10.1136/bmjopen-2020-047356] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 05/05/2021] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVES Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype. STUDY DESIGN A retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype. SETTING Four ARIC study hospitals, each representing an ARIC study region in the USA. PARTICIPANTS A stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset. INTERVENTION Use of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype. PRIMARY AND SECONDARY OUTCOME MEASURES NLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard. RESULTS Overall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively. CONCLUSIONS By decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.
Collapse
Affiliation(s)
- Carlton R Moore
- Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, USA
| | - Saumya Jain
- Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephanie Haas
- School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Harish Yadav
- School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Eric Whitsel
- Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Wayne Rosamand
- Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Gerardo Heiss
- Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | |
Collapse
|
30
|
Liu G, Shi M, Mosley JD, Weng C, Zhang Y, Lee MTM, Jarvik GP, Hakonarson H, Namjou-Khales B, Sleiman P, Luo Y, Mentch F, Denny JC, Linton MF, Wei WQ, Stein CM, Feng Q. A Mendelian Randomization Approach Using 3-HMG-Coenzyme-A Reductase Gene Variation to Evaluate the Association of Statin-Induced Low-Density Lipoprotein Cholesterol Lowering With Noncardiovascular Disease Phenotypes. JAMA Netw Open 2021; 4:e2112820. [PMID: 34097045 PMCID: PMC8185593 DOI: 10.1001/jamanetworkopen.2021.12820] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
IMPORTANCE Observational studies suggest that statins, which inhibit 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase, may be associated with beneficial effects in many noncardiovascular diseases. OBJECTIVE To construct a weighted HMG-CoA reductase (HMGCR) gene genetic risk score (GRS) using variants in the HMGCR gene affecting low-density lipoprotein cholesterol as an instrumental variable for mendelian randomization analyses to test associations with candidate noncardiovascular phenotypes previously associated with statin use in observational studies. DESIGN, SETTING, AND PARTICIPANTS This cohort study included 53 385 unrelated adults of European ancestry with genome-wide genotypes available from BioVU (a practice-based biobank, used for discovery) and 30 444 unrelated adults with European ancestry available in the Electronic Medical Records and Genomics (eMERGE; a research consortium that conducts genetic research using electronic medical records, used for replication). The study was conducted from February 6, 2015, through April 31, 2019; data analysis was performed from August 26, 2019, through December 22, 2020. INTERVENTIONS An HMGCR GRS was calculated. MAIN OUTCOMES AND MEASURES The association between the HMGCR GRS and the presence or absence of 22 noncardiovascular phenotypes previously associated with statin use in clinical studies. RESULTS Of the 53 385 individuals in BioVU, 29 958 (56.1%) were women; mean (SD) age was 59.9 (15.6) years. The finding between the HMGCR GRS and the noncardiovascular phenotypes of interest in this cohort was significant only for type 2 diabetes. An HMGCR GRS equivalent to a 10-mg/dL decrease in the low-density lipoprotein cholesterol level was associated with an increased risk of type 2 diabetes (odds ratio [OR], 1.09; 95% CI, 1.04-1.15; P = 5.58 × 10-4). The HMGCR GRS was not associated with other phenotypes; the closest were increased risk of Parkinson disease (OR, 1.30; 95% CI, 1.07-1.58; P = .007) and kidney failure (OR, 1.18; 95% CI, 1.05-1.34; P = .008). Of the 30 444 individuals in eMERGE, 16 736 (55.0%) were women; mean (SD) age was 68.7 (15.4) years. The association between the HMGCR GRS and type 2 diabetes was replicated in this cohort (OR, 1.09; 95% CI, 1.01-1.17; P = .02); however, the HMGCR GRS was not associated with Parkinson disease (OR, 0.93; 95% CI, 0.75-1.16; P = .53) and kidney failure (OR, 1.18; 95% CI, 0.98-1.41; P = .08) in the eMERGE cohort. CONCLUSIONS AND RELEVANCE A mendelian randomization approach using variants in the HMGCR gene replicated the association between statin use and increased type 2 diabetes risk but provided no strong evidence for pleiotropic effects of statin-induced decrease of the low-density lipoprotein cholesterol level on other diseases.
Collapse
Affiliation(s)
- Ge Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Mingjian Shi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jonathan D. Mosley
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York
| | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger Health System, Danville, Pennsylvania
- Musculoskeletal Institute, Geisinger, Danville, Pennsylvania
| | - Ming Ta Michael Lee
- Genomic Medicine Institute, Geisinger Health System, Danville, Pennsylvania
- Musculoskeletal Institute, Geisinger, Danville, Pennsylvania
| | - Gail P. Jarvik
- Department of Medicine, Division of Medical Genetics, University of Washington Medical Center, Seattle
- Department of Genome Sciences, University of Washington, Seattle
| | - Hakon Hakonarson
- The Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania
- Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
- Division of Pulmonary Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Bahram Namjou-Khales
- UC Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Patrick Sleiman
- The Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania
- Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Frank Mentch
- The Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
- at the time of the study, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- All of Us Research Program, National Institutes of Health, Bethesda, Maryland
- now, National Institutes of Health, Bethesda, Maryland
| | - MacRae F. Linton
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Department of Pharmacology, Vanderbilt University, Nashville, Tennessee
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - C. Michael Stein
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Department of Pharmacology, Vanderbilt University, Nashville, Tennessee
| | - QiPing Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
31
|
Development of an Algorithm to Identify Cases of Nonalcoholic Steatohepatitis Cirrhosis in the Electronic Health Record. Dig Dis Sci 2021; 66:1452-1460. [PMID: 32535780 DOI: 10.1007/s10620-020-06388-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 06/03/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND AND AIMS Current genetic research of nonalcoholic steatohepatitis (NASH) cirrhosis is limited by our ability to accurately identify cases on a large scale. Our objective was to develop and validate an electronic health record (EHR) algorithm to accurately identify cases of NASH cirrhosis in the EHR. METHODS We used Clinical Query 2, a search tool at Beth Israel Deaconess Medical Center, to create a pool of potential NASH cirrhosis cases (n = 5415). We created a training set of 300 randomly selected patients for chart review to confirm cases of NASH cirrhosis. Test characteristics of different algorithms, consisting of diagnosis codes, laboratory values, anthropomorphic measurements, and medication records, were calculated. The algorithms with the highest positive predictive value (PPV) and the highest F score with a PPV ≥ 80% were selected for internal validation using a separate random set of 100 patients from the potential NASH cirrhosis pool. These were then externally validated in another random set of 100 individuals using the research patient data registry tool at Massachusetts General Hospital. RESULTS The algorithm with the highest PPV of 100% on internal validation and 92% on external validation consisted of ≥ 3 counts of "cirrhosis, no mention of alcohol" (571.5, K74.6) and ≥ 3 counts of "nonalcoholic fatty liver" (571.8-571.9, K75.81, K76.0) codes in the absence of any diagnosis codes for other common causes of chronic liver disease. CONCLUSIONS We developed and validated an EHR algorithm using diagnosis codes that accurately identifies patients with NASH cirrhosis.
Collapse
|
32
|
Dashti HS, Cade BE, Stutaite G, Saxena R, Redline S, Karlson EW. Sleep health, diseases, and pain syndromes: findings from an electronic health record biobank. Sleep 2021; 44:5909423. [PMID: 32954408 DOI: 10.1093/sleep/zsaa189] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 08/28/2020] [Indexed: 02/02/2023] Open
Abstract
STUDY OBJECTIVES Implementation of electronic health record biobanks has facilitated linkage between clinical and questionnaire data and enabled assessments of relationships between sleep health and diseases in phenome-wide association studies (PheWAS). In the Mass General Brigham Biobank, a large health system-based study, we aimed to systematically catalog associations between time in bed, sleep timing, and weekly variability with clinical phenotypes derived from ICD-9/10 codes. METHODS Self-reported habitual bed and wake times were used to derive variables: short (<7 hours) and long (≥9 hours) time in bed, sleep midpoint, social jetlag, and sleep debt. Logistic regression and Cox proportional hazards models were used to test cross-sectional and prospective associations, respectively, adjusted for age, gender, race/ethnicity, and employment status and further adjusted for body mass index. RESULTS In cross-sectional analysis (n = 34,651), sleep variable associations were most notable for circulatory system, mental disorders, and endocrine/metabolic phenotypes. We observed the strongest associations for short time in bed with obesity, for long time in bed and sleep midpoint with major depressive disorder, for social jetlag with hypercholesterolemia, and for sleep debt with acne. In prospective analysis (n = 24,065), we observed short time in bed associations with higher incidence of acute pain and later sleep midpoint and higher sleep debt and social jetlag associations with higher incidence of major depressive disorder. CONCLUSIONS Our analysis reinforced that sleep health is a multidimensional construct, corroborated robust known findings from traditional cohort studies, and supported the application of PheWAS as a promising tool for advancing sleep research. Considering the exploratory nature of PheWAS, careful interrogation of novel findings is imperative.
Collapse
Affiliation(s)
- Hassan S Dashti
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Broad Institute, Cambridge, MA.,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Brian E Cade
- Broad Institute, Cambridge, MA.,Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA
| | - Gerda Stutaite
- Mass General Brigham Personalized Medicine, Mass General Brigham, Boston, MA
| | - Richa Saxena
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Broad Institute, Cambridge, MA.,Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA.,Division of Sleep Medicine, Harvard Medical School, Boston, MA.,Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Elizabeth W Karlson
- Mass General Brigham Personalized Medicine, Mass General Brigham, Boston, MA.,Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA
| |
Collapse
|
33
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. OBJECTIVE This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. METHODS A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. RESULTS A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule-based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. CONCLUSIONS Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Calgary, AB, Canada
- Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
34
|
Brugger SW, Gardner MC, Beales JT, Briggs F, Davis MF. Depression in multiple sclerosis patients associated with risk variant near NEGR1. Mult Scler Relat Disord 2020; 46:102537. [PMID: 33296963 DOI: 10.1016/j.msard.2020.102537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 09/09/2020] [Accepted: 09/25/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND A substantial number of patients diagnosed with multiple sclerosis (MS) suffer from depression in addition to physical symptoms and disability. Recent evidence suggests a stronger relationship may exist between MS and depression than previously thought, in which a diagnosis of depression may be prodromic to the development of MS. METHODS A genome-wide association study (GWAS) was performed to identify genetic variants associated with the development of depression in a cohort of MS patients. The control group (n = 1180) was composed of MS patients with no diagnoses of depression as determined by ICD-9 and ICD-10 billing codes present in the electronic health record (EHR). Separate analyses were performed for three different case groups: 1) MS patients having a depression diagnosis at any time (n = 182), 2) MS patients having a depression diagnosis one year pre-MS diagnosis (n = 27), and 3) MS patients having a depression diagnosis one year post-MS diagnosis (n = 130). Logistic regression analyses were also performed to test for associations between the development of depression and an APOE tagging variant, as APOE was previously linked to depressive affect in MS. An additional logistic regression analysis tested for associations between depression in MS patients and SNPs associated with depression in the general population. Pathway enrichment analyses were also conducted to identify pathways that link the two diseases. RESULTS GWAS identified no novel associations between variants and a diagnosis of depression relative to a diagnosis of MS. One variant, rs1432639, associated with depression in the general population, was significantly associated with the development of depression post-MS diagnosis. The APOE-related SNPs were not associated with depression in this study population. An IGF1 pathway approached statistical significance in patients diagnosed with depression prior to a diagnosis of MS. CONCLUSION rs1432639 and the IGF1 pathway provide evidence for a genetic link between MS and depression that warrants further research.
Collapse
Affiliation(s)
- Steven W Brugger
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States
| | - M Cannon Gardner
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States
| | - Jeremy T Beales
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States
| | - Farren Briggs
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, United States
| | - Mary F Davis
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States; Department Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States.
| |
Collapse
|
35
|
Khare R, Kappelman MD, Samson C, Pyrzanowski J, Darwar RA, Forrest CB, Bailey CC, Margolis P, Dempsey A. Development and evaluation of an EHR-based computable phenotype for identification of pediatric Crohn's disease patients in a National Pediatric Learning Health System. Learn Health Syst 2020; 4:e10243. [PMID: 33083542 PMCID: PMC7556434 DOI: 10.1002/lrh2.10243] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 06/16/2020] [Accepted: 07/23/2020] [Indexed: 01/16/2023] Open
Abstract
Objectives To develop and evaluate the classification accuracy of a computable phenotype for pediatric Crohn's disease using electronic health record data from PEDSnet, a large, multi‐institutional research network and Learning Health System. Study Design Using clinician and informatician input, algorithms were developed using combinations of diagnostic and medication data drawn from the PEDSnet clinical dataset which is comprised of 5.6 million children from eight U.S. academic children's health systems. Six test algorithms (four cases, two non‐cases) that combined use of specific medications for Crohn's disease plus the presence of Crohn's diagnosis were initially tested against the entire PEDSnet dataset. From these, three were selected for performance assessment using manual chart review (primary case algorithm, n = 360, primary non‐case algorithm, n = 360, and alternative case algorithm, n = 80). Non‐cases were patients having gastrointestinal diagnoses other than inflammatory bowel disease. Sensitivity, specificity, and positive predictive value (PPV) were assessed for the primary case and primary non‐case algorithms. Results Of the six algorithms tested, the least restrictive algorithm requiring just ≥1 Crohn's diagnosis code yielded 11 950 cases across PEDSnet (prevalence 21/10 000). The most restrictive algorithm requiring ≥3 Crohn's disease diagnoses plus at least one medication yielded 7868 patients (prevalence 14/10 000). The most restrictive algorithm had the highest PPV (95%) and high sensitivity (91%) and specificity (94%). False positives were due primarily to a diagnosis reversal (from Crohn's disease to ulcerative colitis) or having a diagnosis of “indeterminate colitis.” False negatives were rare. Conclusions Using diagnosis codes and medications available from PEDSnet, we developed a computable phenotype for pediatric Crohn's disease that had high specificity, sensitivity and predictive value. This process will be of use for developing computable phenotypes for other pediatric diseases, to facilitate cohort identification for retrospective and prospective studies, and to optimize clinical care through the PEDSnet Learning Health System.
Collapse
Affiliation(s)
| | - Michael D Kappelman
- Division of Pediatric Gastroenterology, Department of Pediatrics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Charles Samson
- Division of Gastroenterology, Hepatology & Nutrition; Department of Pediatrics Washington University in St Louis School of Medicine St. Louis Missouri USA
| | - Jennifer Pyrzanowski
- Adult and Child Consortium for Outcomes Research and Dissemination Science University of Colorado Denver Aurora Colorado USA
| | - Rahul A Darwar
- Applied Clinical Research Center Children's Hospital of Philadelphia Philadelphia Pennsylvania USA
| | - Christopher B Forrest
- Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia PA and Department of Pediatrics, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USA
| | - Charles C Bailey
- Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia PA and Department of Pediatrics, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USA
| | - Peter Margolis
- James M. Anderson Center for Health Systems Excellence, Department of Pediatrics Cincinnati Children's Hospital Medical Center Cincinnati Ohio USA
| | - Amanda Dempsey
- Department of Pediatrics University of Colorado Denver Aurora Colorado USA
| | | |
Collapse
|
36
|
Curtis JR, Xie F, Zhou H, Salchert D, Yun H. Use of ICD-10 diagnosis codes to identify seropositive and seronegative rheumatoid arthritis when lab results are not available. Arthritis Res Ther 2020; 22:242. [PMID: 33059732 PMCID: PMC7560310 DOI: 10.1186/s13075-020-02310-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rheumatoid factor (RF) and anti-cyclic citrullinated peptide (CCP) antibody tests are often measured at the time of rheumatoid arthritis (RA) diagnosis but may not be repeated and therefore not available in electronic health record (EHR) data; lab test results are unavailable in most administrative claims databases. ICD10 coding allows discrimination between rheumatoid factor positive (M05) ("seropositive") and seronegative (M06) patients, but the validity of these codes has not been examined. METHODS Using the ACR's Rheumatology Informatics System for Effectiveness (RISE) EHR-based registry and U.S. MarketScan data where some patients have lab test results, we assembled two cohorts. Seropositive RA was defined having a M05 diagnosis code on the second rheumatologist encounter, M06 similarly identified seronegative RA, and RF and anti-CCP lab test results were the gold standard. We calculated sensitivity (Se) and positive predicted value (PPV) of the M05/M06 diagnosis codes. RESULTS We identified 43,581 eligible RA patients (RISE) and 1185 (MarketScan) with RF or anti-CCP lab test results available. Using M05 as the proxy for seropositive RA, sensitivity = 0.76, PPV = 0.82 in RISE, and Se = 0.73, PPV = 0.84 in MarketScan. Results for M06 as a proxy for seronegative RA were comparable in RISE, albeit somewhat lower in MarketScan. Over 3 consecutive visits, approximately 90% of RA patients were coded consistently using either M05 or M06 at each visit. CONCLUSION Under ICD10, M05 and M06 diagnosis codes are reasonable proxies to identify seropositive and seronegative RA with high sensitivity and positive predictive values if lab test results are not available.
Collapse
Affiliation(s)
- Jeffrey R Curtis
- Division of Clinical Immunology & Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA. .,Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA. .,Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Fenglong Xie
- Division of Clinical Immunology & Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Hong Zhou
- Division of Clinical Immunology & Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - David Salchert
- Division of Clinical Immunology & Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Huifeng Yun
- Division of Clinical Immunology & Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA.,Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
37
|
Dueñas HR, Seah C, Johnson JS, Huckins LM. Implicit bias of encoded variables: frameworks for addressing structured bias in EHR-GWAS data. Hum Mol Genet 2020; 29:R33-R41. [PMID: 32879975 PMCID: PMC7530523 DOI: 10.1093/hmg/ddaa192] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/17/2020] [Accepted: 08/18/2020] [Indexed: 12/20/2022] Open
Abstract
The 'discovery' stage of genome-wide association studies required amassing large, homogeneous cohorts. In order to attain clinically useful insights, we must now consider the presentation of disease within our clinics and, by extension, within our medical records. Large-scale use of electronic health record (EHR) data can help to understand phenotypes in a scalable manner, incorporating lifelong and whole-phenome context. However, extending analyses to incorporate EHR and biobank-based analyses will require careful consideration of phenotype definition. Judgements and clinical decisions that occur 'outside' the system inevitably contain some degree of bias and become encoded in EHR data. Any algorithmic approach to phenotypic characterization that assumes non-biased variables will generate compounded biased conclusions. Here, we discuss and illustrate potential biases inherent within EHR analyses, how these may be compounded across time and suggest frameworks for large-scale phenotypic analysis to minimize and uncover encoded bias.
Collapse
Affiliation(s)
- Hillary R Dueñas
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Carina Seah
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jessica S Johnson
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Laura M Huckins
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters Department of Veterans Affairs Medical Center, Bronx, NY 10468, USA
| |
Collapse
|
38
|
Nayeri A, Xu M, Farber-Eger E, Blair M, Saini I, Shamsa K, Fonarow G, Horwich T, Wells QS. Initial changes in peak aortic jet velocity and mean gradient predict progression to severe aortic stenosis. IJC HEART & VASCULATURE 2020; 30:100592. [PMID: 32760781 PMCID: PMC7390852 DOI: 10.1016/j.ijcha.2020.100592] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 06/24/2020] [Accepted: 07/13/2020] [Indexed: 11/28/2022]
Abstract
Background There is significant interindividual variability in the rate of aortic stenosis (AS) progression that is not accounted for in the current surveillance algorithms. We sought to examine the association between changes in peak aortic jet velocity (Vmax) and mean gradient (MG) among patients with mild or moderate AS and risk of progression to severe disease. Methods Adult subjects referred for echocardiography at a single academic referral center with a diagnosis of mild or moderate AS and ≥2 additional surveillance echocardiograms were included in the study. Changes in Vmax and MG between the first two echocardiograms were indexed to time and tested for association with future progression to severe AS. Results Among three hundred and sixty-four subjects, the median time between first and second echocardiograms was 1.3 years and initial changes in Vmax and MG indexed to time were +0.16 m/s per year and +1.44 mmHg per year, respectively. Fifty-three (15%) and fifty-six (15%) subjects progressed to severe AS defined by Vmax and MG, respectively. In multivariable logistic regression, initial increase in Vmax (OR = 4.19, 95% CI 1.93–9.10, p < 0.001) and initial increase in MG (OR = 1.12, 95% CI 1.06–1.18, p < 0.001) were associated with progression to severe AS. Conclusions Initial changes in Vmax and MG among patients with mild or moderate AS are strongly associated with risk of progression to severe AS and may help guide individualized surveillance strategies.
Collapse
Affiliation(s)
- Arash Nayeri
- University of California, Los Angeles, CA, United States
- Corresponding author at: University of California, Los Angeles, Department of Cardiovascular Medicine, 757 Westwood Plaza, St. 7501, Los Angeles, CA 90095-7417, United States.
| | - Meng Xu
- Vanderbilt University Medical Center, Nashville, TN, United States
| | - Eric Farber-Eger
- Vanderbilt University Medical Center, Nashville, TN, United States
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC), Nashville, TN, United States
| | - Marcia Blair
- Vanderbilt University Medical Center, Nashville, TN, United States
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC), Nashville, TN, United States
| | | | - Kamran Shamsa
- University of California, Los Angeles, CA, United States
| | - Gregg Fonarow
- University of California, Los Angeles, CA, United States
| | - Tamara Horwich
- University of California, Los Angeles, CA, United States
| | - Quinn S. Wells
- Vanderbilt University Medical Center, Nashville, TN, United States
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC), Nashville, TN, United States
| |
Collapse
|
39
|
Thayer TE, Lino Cardenas CL, Martyn T, Nicholson CJ, Traeger L, Wunderer F, Slocum C, Sigurslid H, Shakartzi HR, O'Rourke C, Shelton G, Buswell MD, Barnes H, Neitzel LR, Ledsky CD, Li JP, Burke MF, Farber-Eger E, Perrien DS, Kumar R, Corey KE, Wells QS, Bloch KD, Hong CC, Bloch DB, Malhotra R. The Role of Bone Morphogenetic Protein Signaling in Non-Alcoholic Fatty Liver Disease. Sci Rep 2020; 10:9831. [PMID: 32561790 PMCID: PMC7305229 DOI: 10.1038/s41598-020-66770-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 05/05/2020] [Indexed: 12/14/2022] Open
Abstract
Non-alcoholic fatty liver disease (NAFLD) affects over 30% of adults in the United States. Bone morphogenetic protein (BMP) signaling is known to contribute to hepatic fibrosis, but the role of BMP signaling in the development of NAFLD is unclear. In this study, treatment with either of two BMP inhibitors reduced hepatic triglyceride content in diabetic (db/db) mice. BMP inhibitor-induced decrease in hepatic triglyceride levels was associated with decreased mRNA encoding Dgat2, an enzyme integral to triglyceride synthesis. Treatment of hepatoma cells with BMP2 induced DGAT2 expression and activity via intracellular SMAD signaling. In humans we identified a rare missense single nucleotide polymorphism in the BMP type 1 receptor ALK6 (rs34970181;R371Q) associated with a 2.1-fold increase in the prevalence of NAFLD. In vitro analyses revealed R371Q:ALK6 is a previously unknown constitutively active receptor. These data show that BMP signaling is an important determinant of NAFLD in a murine model and is associated with NAFLD in humans.
Collapse
Affiliation(s)
- Timothy E Thayer
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States.,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Christian L Lino Cardenas
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Trejeeve Martyn
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Christopher J Nicholson
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Lisa Traeger
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Florian Wunderer
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Charles Slocum
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Haakon Sigurslid
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Hannah R Shakartzi
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Caitlin O'Rourke
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Georgia Shelton
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Mary D Buswell
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Hanna Barnes
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Leif R Neitzel
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Clara D Ledsky
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Jason Pingcheng Li
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Megan F Burke
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Eric Farber-Eger
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Daniel S Perrien
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | | | - Kathleen E Corey
- GI Unit, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Quinn S Wells
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Kenneth D Bloch
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States.,Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Charles C Hong
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Donald B Bloch
- Anesthesia Center for Critical Care Research of the Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States.,Center for Immunology and Inflammatory Diseases and the Division of Rheumatology, Allergy, and Immunology of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Rajeev Malhotra
- Cardiovascular Research Center and Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States.
| |
Collapse
|
40
|
Wang L, Olson JE, Bielinski SJ, St Sauver JL, Fu S, He H, Cicek MS, Hathcock MA, Cerhan JR, Liu H. Impact of Diverse Data Sources on Computational Phenotyping. Front Genet 2020; 11:556. [PMID: 32582289 PMCID: PMC7283539 DOI: 10.3389/fgene.2020.00556] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 05/07/2020] [Indexed: 11/25/2022] Open
Abstract
Electronic health records (EHRs) are widely adopted with a great potential to serve as a rich, integrated source of phenotype information. Computational phenotyping, which extracts phenotypes from EHR data automatically, can accelerate the adoption and utilization of phenotype-driven efforts to advance scientific discovery and improve healthcare delivery. A list of computational phenotyping algorithms has been published but data fragmentation, i.e., incomplete data within one single data source, has been raised as an inherent limitation of computational phenotyping. In this study, we investigated the impact of diverse data sources on two published computational phenotyping algorithms, rheumatoid arthritis (RA) and type 2 diabetes mellitus (T2DM), using Mayo EHRs and Rochester Epidemiology Project (REP) which links medical records from multiple health care systems. Results showed that both RA (less prevalent) and T2DM (more prevalent) case selections were markedly impacted by data fragmentation, with positive predictive value (PPV) of 91.4 and 92.4%, false-negative rate (FNR) of 26.6 and 14% in Mayo data, respectively, PPV of 97.2 and 98.3%, FNR of 5.2 and 3.3% in REP. T2DM controls also contain biases, with PPV of 91.2% and FNR of 1.2% for Mayo. We further elaborated underlying reasons impacting the performance.
Collapse
Affiliation(s)
- Liwei Wang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Janet E Olson
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.,Center for Individualized Medicine, Mayo Clinic, Rochester, MN, United States
| | - Suzette J Bielinski
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Jennifer L St Sauver
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Huan He
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Mine S Cicek
- Division of Experimental Pathology, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Matthew A Hathcock
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - James R Cerhan
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
41
|
Abstract
BACKGROUND Data mining technology used in the field of medicine has been widely studied by scholars all over the world. But there is little research on medical data mining (MDM) from the perspectives of bibliometrics and visualization, and the research topics and development trends in this field are still unclear. METHODS This paper has applied bibliometric visualization software tools, VOSviewer 1.6.10 and CiteSpace V, to study the citation characteristics, international cooperation, author cooperation, and geographical distribution of the MDM. RESULTS A total of 1575 documents are obtained, and the most frequent document type is article (1376). SHAN NH is the most productive author, with the highest number of publications of 12, and the Gillies's article (750 times citation) is the most cited paper. The most productive country and institution in MDM is the USA (559) and US FDA (35), respectively. The Journal of Biomedical Informatics, Expert Systems with Applications and Journal of Medical Systems are the most productive journals, which reflected the nature of the research, and keywords "classification (790)" and "system (576)" have the strongest strength. The hot topics in MDM are drug discovery, medical imaging, vaccine safety, and so on. The 3 frontier topics are reporting system, precision medicine, and inflammation, and would be the foci of future research. CONCLUSION The present study provides a panoramic view of data mining methods applied in medicine by visualization and bibliometrics. Analysis of authors, journals, institutions, and countries could provide reference for researchers who are fresh to the field in different ways. Researchers may also consider the emerging trends when deciding the direction of their study.
Collapse
Affiliation(s)
- Yuanzhang Hu
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| | - Zeyun Yu
- College of Acupuncture and TuiNa, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Xiaoen Cheng
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| | - Yue Luo
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| | - Chuanbiao Wen
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| |
Collapse
|
42
|
Ahlers MJ, Lowery BD, Farber-Eger E, Wang TJ, Bradham W, Ormseth MJ, Chung CP, Stein CM, Gupta DK. Heart Failure Risk Associated With Rheumatoid Arthritis-Related Chronic Inflammation. J Am Heart Assoc 2020; 9:e014661. [PMID: 32378457 PMCID: PMC7660862 DOI: 10.1161/jaha.119.014661] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Background Inflammation may contribute to incident heart failure (HF). Rheumatoid arthritis (RA), a prototypic inflammatory condition, may serve as a model for understanding inflammation‐related HF risk. Methods and Results Using the Vanderbilt University Medical Center electronic health record, we retrospectively identified 9889 patients with RA and 9889 control patients without autoimmune disease matched for age, sex, and race. Prevalent HF at entry into the electronic health record or preceding RA diagnosis was excluded. Incident HF was ascertained using International Classification of Diseases, Ninth Revision (ICD‐9), codes and medications. Over 177 566 person‐years of follow‐up, patients with RA were at 21% greater risk of HF (95% CI, 3–42%) independent of traditional cardiovascular risk factors. Among patients with RA, higher CRP (C‐reactive protein) was associated with greater HF risk (P<0.001), while the anti‐inflammatory drug methotrexate was associated with ≈25% lower HF risk (P=0.021). In a second cohort (n=115) of prospectively enrolled patients with and without RA, we performed proteomics and cardiac magnetic resonance imaging to discover circulating markers of inflammation associated with cardiac structure and function. Artemin levels were higher in patients with RA compared with controls (P=0.009), and higher artemin levels were associated with worse ventricular end‐systolic elastance and ventricular‐vascular coupling ratio (P=0.044 and P=0.031, respectively). Conclusions RA, a prototypic chronic inflammatory condition, is associated with increased risk of HF. Among patients with RA, higher levels of CRP were associated with greater HF risk, while methotrexate was associated with lower risk.
Collapse
Affiliation(s)
- Michael J Ahlers
- Vanderbilt University School of Medicine Nashville TN.,Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC) Vanderbilt University Medical Center Nashville TN
| | - Brandon D Lowery
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC) Vanderbilt University Medical Center Nashville TN.,Vanderbilt Institute for Clinical and Translational Research Vanderbilt University Medical Center Nashville TN
| | - Eric Farber-Eger
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC) Vanderbilt University Medical Center Nashville TN.,Vanderbilt Institute for Clinical and Translational Research Vanderbilt University Medical Center Nashville TN
| | - Thomas J Wang
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC) Vanderbilt University Medical Center Nashville TN.,Division of Cardiovascular Medicine Vanderbilt University Medical Center Nashville TN
| | - William Bradham
- Division of Cardiovascular Medicine Vanderbilt University Medical Center Nashville TN
| | - Michelle J Ormseth
- Divisions of Rheumatology and Clinical Pharmacology Vanderbilt University Medical Center Nashville TN.,Tennessee Valley Healthcare System U.S. Department of Veterans Affairs Nashville TN
| | - Cecilia P Chung
- Divisions of Rheumatology and Clinical Pharmacology Vanderbilt University Medical Center Nashville TN.,Tennessee Valley Healthcare System U.S. Department of Veterans Affairs Nashville TN
| | - C Michael Stein
- Divisions of Rheumatology and Clinical Pharmacology Vanderbilt University Medical Center Nashville TN
| | - Deepak K Gupta
- Vanderbilt Translational and Clinical Cardiovascular Research Center (VTRACC) Vanderbilt University Medical Center Nashville TN.,Division of Cardiovascular Medicine Vanderbilt University Medical Center Nashville TN
| |
Collapse
|
43
|
Chasman DI, Giulianini F, Demler OV, Udler MS. Pleiotropy-Based Decomposition of Genetic Risk Scores: Association and Interaction Analysis for Type 2 Diabetes and CAD. Am J Hum Genet 2020; 106:646-658. [PMID: 32302534 DOI: 10.1016/j.ajhg.2020.03.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/25/2020] [Indexed: 12/24/2022] Open
Abstract
Genetic risk for a disease in the population may be represented as a genetic risk score (GRS) constructed as the sum of inherited risk alleles, weighted by allelic effects established in an independent population. While this formulation captures overall genetic risk, it typically does not address risk due to specific biological mechanisms or pathways that may nevertheless be important for interpretation or treatment response. Here, a GRS for disease is resolved into independent or nearly independent components pertaining to biological mechanisms inferred from pleiotropic relationships. The component GRSs' weights are derived from the singular value decomposition (SVD) of the matrix of appropriately scaled genetic effects, i.e., beta coefficients, of the disease variants across a panel of the disease-related phenotypes. The SVD-based formalism also associates combinations of disease-related phenotypes with inferred disease pathways. Applied to incident type 2 diabetes (T2D) in the Women's Genome Health Study (N = 23,294), component GRSs discriminate glycemic control and lipid-based genetic risk, while revealing significant interactions between specific components and BMI or physical activity, the latter not observed with a GRS for overall T2D genetic liability. Applied to coronary artery disease (CAD) in both the WGHS and in JUPITER (N = 8,749), a randomized trial of rosuvastatin for primary prevention of CVD, component GRSs discriminate genetic risk associated with LDL-C from risk associated with reciprocal genetic effects on triglycerides and HDL-C. They also inform the pharmacogenetics of statin treatment by demonstrating that benefit from rosuvastatin is as strongly related to genetic risk from triglycerides and HDL-C as from LDL-C.
Collapse
Affiliation(s)
- Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Harvard Medical School, Boston, MA 02115, USA; Medical and Population Genetics Program, Broad Institute, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| | - Franco Giulianini
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Olga V Demler
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02115, USA
| | - Miriam S Udler
- Harvard Medical School, Boston, MA 02115, USA; Medical and Population Genetics Program, Broad Institute, Cambridge, MA 02142, USA; Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
44
|
Hanson HA, Leiser CL, Madsen MJ, Gardner J, Knight S, Cessna M, Sweeney C, Doherty JA, Smith KR, Bernard PS, Camp NJ. Family Study Designs Informed by Tumor Heterogeneity and Multi-Cancer Pleiotropies: The Power of the Utah Population Database. Cancer Epidemiol Biomarkers Prev 2020; 29:807-815. [PMID: 32098891 PMCID: PMC7168701 DOI: 10.1158/1055-9965.epi-19-0912] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 01/15/2020] [Accepted: 02/18/2020] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Previously, family-based designs and high-risk pedigrees have illustrated value for the discovery of high- and intermediate-risk germline breast cancer susceptibility genes. However, genetic heterogeneity is a major obstacle hindering progress. New strategies and analytic approaches will be necessary to make further advances. One opportunity with the potential to address heterogeneity via improved characterization of disease is the growing availability of multisource databases. Specific to advances involving family-based designs are resources that include family structure, such as the Utah Population Database (UPDB). To illustrate the broad utility and potential power of multisource databases, we describe two different novel family-based approaches to reduce heterogeneity in the UPDB. METHODS Our first approach focuses on using pedigree-informed breast tumor phenotypes in gene mapping. Our second approach focuses on the identification of families with similar pleiotropies. We use a novel network-inspired clustering technique to explore multi-cancer signatures for high-risk breast cancer families. RESULTS Our first approach identifies a genome-wide significant breast cancer locus at 2q13 [P = 1.6 × 10-8, logarithm of the odds (LOD) equivalent 6.64]. In the region, IL1A and IL1B are of particular interest, key cytokine genes involved in inflammation. Our second approach identifies five multi-cancer risk patterns. These clusters include expected coaggregations (such as breast cancer with prostate cancer, ovarian cancer, and melanoma), and also identify novel patterns, including coaggregation with uterine, thyroid, and bladder cancers. CONCLUSIONS Our results suggest pedigree-informed tumor phenotypes can map genes for breast cancer, and that various different cancer pleiotropies exist for high-risk breast cancer pedigrees. IMPACT Both methods illustrate the potential for decreasing etiologic heterogeneity that large, population-based multisource databases can provide.See all articles in this CEBP Focus section, "Modernizing Population Science."
Collapse
Affiliation(s)
- Heidi A Hanson
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah.
- Utah Population Database, University of Utah, Salt Lake City, Utah
- Department of Surgery, University of Utah, Salt Lake City, Utah
| | - Claire L Leiser
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
- Department of Epidemiology, University of Washington, Seattle, Washington
| | - Michael J Madsen
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | - John Gardner
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | | | - Melissa Cessna
- Intermountain Biorepository, Intermountain Healthcare, Salt Lake City, Utah
- Department of Pathology, Intermountain Medical Center, Intermountain Healthcare, Salt Lake City, Utah
| | - Carol Sweeney
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
- Utah Cancer Registry, University of Utah, Salt Lake City, Utah
- Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, Utah
| | - Jennifer A Doherty
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
- Utah Cancer Registry, University of Utah, Salt Lake City, Utah
- Department of Population Sciences, University of Utah School of Medicine, Salt Lake City, Utah
| | - Ken R Smith
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
- Utah Population Database, University of Utah, Salt Lake City, Utah
- Department of Family and Consumer Studies, University of Utah, Salt Lake City, Utah
| | - Philip S Bernard
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
- Department of Pathology, University of Utah, Salt Lake City, Utah
| | - Nicola J Camp
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
- Utah Population Database, University of Utah, Salt Lake City, Utah
- Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, Utah
| |
Collapse
|
45
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
46
|
Tong J, Huang J, Chubak J, Wang X, Moore JH, Hubbard RA, Chen Y. An augmented estimation procedure for EHR-based association studies accounting for differential misclassification. J Am Med Inform Assoc 2020; 27:244-253. [PMID: 31617899 DOI: 10.1093/jamia/ocz180] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 08/14/2019] [Accepted: 09/15/2019] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVES The ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data. MATERIALS AND METHODS The proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington. RESULTS The proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data. DISCUSSION Our simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias. CONCLUSIONS The proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.
Collapse
Affiliation(s)
- Jiayi Tong
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jing Huang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jessica Chubak
- Department of Epidemiology, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Xuan Wang
- Department of Statistics, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
47
|
Unlu G, Qi X, Gamazon ER, Melville DB, Patel N, Rushing AR, Hashem M, Al-Faifi A, Chen R, Li B, Cox NJ, Alkuraya FS, Knapik EW. Phenome-based approach identifies RIC1-linked Mendelian syndrome through zebrafish models, biobank associations and clinical studies. Nat Med 2020; 26:98-109. [PMID: 31932796 PMCID: PMC7147997 DOI: 10.1038/s41591-019-0705-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 11/15/2019] [Indexed: 12/17/2022]
Abstract
Discovery of genotype-phenotype relationships remains a major challenge in clinical medicine. Here, we combined three sources of phenotypic data to uncover a novel mechanism for rare and common diseases resulting from collagen secretion deficits. Using zebrafish genetic screen, we identified the ric1 gene to be essential for skeletal biology. Using a gene-based phenome-wide association study (PheWAS) in the EHR-linked BioVU biobank, we show that reduced genetically determined expression of RIC1 is associated with musculoskeletal and dental conditions. Whole exome sequencing (WES) identified individuals homozygous-by-descent for a rare variant in RIC1, and, through a guided clinical re-evaluation, they were discovered to share signs with the BioVU-associated phenome. We named this novel Mendelian syndrome CATIFA (Cleft lip, cAtaract, Tooth abnormality, Intellectual disability, Facial dysmorphism, ADHD), and revealed further disease mechanisms. This gene-based PheWAS-guided approach can accelerate the discovery of clinically relevant disease phenome and associated biological mechanisms.
Collapse
Affiliation(s)
- Gokhan Unlu
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN, USA.,Laboratory of Metabolic Regulation and Genetics, The Rockefeller University, New York, NY, USA
| | - Xinzi Qi
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Clare Hall, University of Cambridge, Cambridge, UK
| | - David B Melville
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Molecular and Cellular Biology, Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Nisha Patel
- Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Amy R Rushing
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mais Hashem
- Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Abdullah Al-Faifi
- Department of Pediatrics, Security Forces Hospital, Riyadh, Saudi Arabia
| | - Rui Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| | - Bingshan Li
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| | - Nancy J Cox
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Fowzan S Alkuraya
- Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Ela W Knapik
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. .,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA. .,Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
48
|
Robinson JR, Carroll RJ, Bastarache L, Chen Q, Mou Z, Wei WQ, Connolly JJ, Mentch F, Sleiman P, Crane PK, Hebbring SJ, Stanaway IB, Crosslin DR, Gordon AS, Rosenthal EA, Carrell D, Hayes MG, Wei W, Petukhova L, Namjou B, Zhang G, Safarova MS, Walton NA, Still C, Bottinger EP, Loos RJF, Murphy SN, Jackson GP, Kullo IJ, Hakonarson H, Jarvik GP, Larson EB, Weng C, Roden DM, Denny JC. Association of Genetic Risk of Obesity with Postoperative Complications Using Mendelian Randomization. World J Surg 2020; 44:84-94. [PMID: 31605180 PMCID: PMC6925615 DOI: 10.1007/s00268-019-05202-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND The extent to which obesity and genetics determine postoperative complications is incompletely understood. METHODS We performed a retrospective study using two population cohorts with electronic health record (EHR) data. The first included 736,726 adults with body mass index (BMI) recorded between 1990 and 2017 at Vanderbilt University Medical Center. The second cohort consisted of 65,174 individuals from 12 institutions contributing EHR and genome-wide genotyping data to the Electronic Medical Records and Genomics (eMERGE) Network. Pairwise logistic regression analyses were used to measure the association of BMI categories with postoperative complications derived from International Classification of Disease-9 codes, including postoperative infection, incisional hernia, and intestinal obstruction. A genetic risk score was constructed from 97 obesity-risk single-nucleotide polymorphisms for a Mendelian randomization study to determine the association of genetic risk of obesity on postoperative complications. Logistic regression analyses were adjusted for sex, age, site, and race/principal components. RESULTS Individuals with overweight or obese BMI (≥25 kg/m2) had increased risk of incisional hernia (odds ratio [OR] 1.7-5.5, p < 3.1 × 10-20), and people with obesity (BMI ≥ 30 kg/m2) had increased risk of postoperative infection (OR 1.2-2.3, p < 2.5 × 10-5). In the eMERGE cohort, genetically predicted BMI was associated with incisional hernia (OR 2.1 [95% CI 1.8-2.5], p = 1.4 × 10-6) and postoperative infection (OR 1.6 [95% CI 1.4-1.9], p = 3.1 × 10-6). Association findings were similar after limitation of the cohorts to those who underwent abdominal procedures. CONCLUSIONS Clinical and Mendelian randomization studies suggest that obesity, as measured by BMI, is associated with the development of postoperative incisional hernia and infection.
Collapse
Affiliation(s)
- Jamie R Robinson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA.
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Zongyang Mou
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
| | - John J Connolly
- The Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Frank Mentch
- The Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Patrick Sleiman
- The Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Paul K Crane
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Scott J Hebbring
- Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, WI, USA
| | - Ian B Stanaway
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - David R Crosslin
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Adam S Gordon
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Elisabeth A Rosenthal
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - David Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Wei Wei
- University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Lynn Petukhova
- Departments of Dermatology and Epidemiology, Columbia University, New York, NY, USA
| | - Bahram Namjou
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Ge Zhang
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Maya S Safarova
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Nephi A Walton
- Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA
| | - Christopher Still
- Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA
| | - Erwin P Bottinger
- The Charles Bronfman Institute for Personalized Medicine at Mount Sinai, The Mindich Child Health and Development Institute, New York, NY, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine at Mount Sinai, The Mindich Child Health and Development Institute, New York, NY, USA
| | - Shawn N Murphy
- Department of Neurology, Partners Healthcare, Boston, MA, USA
| | - Gretchen P Jackson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Hakon Hakonarson
- The Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Gail P Jarvik
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Eric B Larson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN, 37232-2730, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
49
|
Sinnott JA, Cai F, Yu S, Hejblum BP, Hong C, Kohane IS, Liao KP. PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. J Am Med Inform Assoc 2019; 25:1359-1365. [PMID: 29788308 DOI: 10.1093/jamia/ocy056] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Accepted: 04/23/2018] [Indexed: 12/24/2022] Open
Abstract
Objective Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ≥2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes would improve power for genetic association studies. Methods The PheProb approach employs unsupervised clustering to separate patients into 2 groups based on diagnosis codes. Subjects are assigned a probability of having the phenotype based on the number of diagnosis codes. This approach was developed using simulated EHR data and tested in a real world EHR cohort. In the latter, we tested the association between low density lipoprotein cholesterol (LDL-C) genetic risk alleles known for association with hyperlipidemia and hyperlipidemia codes (ICD-9 272.x). PheProb and thresholding approaches were compared. Results Among n = 1462 subjects in the real world EHR cohort, the threshold-based p-values for association between the genetic risk score (GRS) and hyperlipidemia were 0.126 (≥1 code), 0.123 (≥2 codes), and 0.142 (≥3 codes). The PheProb approach produced the expected significant association between the GRS and hyperlipidemia: p = .001. Conclusions PheProb improves statistical power for association studies relative to standard thresholding approaches by leveraging information about the phenotype in the billing code counts. The PheProb approach has direct applications where efficient approaches are required, such as in Phenome-Wide Association Studies.
Collapse
Affiliation(s)
| | - Fiona Cai
- Stuyvesant High School, New York City, NY, USA
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China.,Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Boris P Hejblum
- Univ. Bordeaux, ISPED, Inserm BPH 1219, Inria SISTM, Bordeaux, France
| | - Chuan Hong
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Children's Hospital Boston, Boston, MA, USA
| | - Katherine P Liao
- Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, USA
| |
Collapse
|
50
|
Racial and Sex Differences in Stroke Risk in Patients With Atrial Fibrillation. J Am Coll Cardiol 2019; 74:3069-3070. [DOI: 10.1016/j.jacc.2019.10.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 10/10/2019] [Accepted: 10/10/2019] [Indexed: 11/20/2022]
|