1
|
Guide A, Sulieman L, Garbett S, Cronin RM, Spotnitz M, Natarajan K, Carroll RJ, Harris P, Chen Q. Identifying erroneous height and weight values from adult electronic health records in the All of Us research program. J Biomed Inform 2024; 155:104660. [PMID: 38788889 DOI: 10.1016/j.jbi.2024.104660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/29/2024] [Accepted: 05/21/2024] [Indexed: 05/26/2024]
Abstract
INTRODUCTION Electronic Health Records (EHR) are a useful data source for research, but their usability is hindered by measurement errors. This study investigated an automatic error detection algorithm for adult height and weight measurements in EHR for the All of Us Research Program (All of Us). METHODS We developed reference charts for adult heights and weights that were stratified on participant sex. Our analysis included 4,076,534 height and 5,207,328 wt measurements from ∼ 150,000 participants. Errors were identified using modified standard deviation scores, differences from their expected values, and significant changes between consecutive measurements. We evaluated our method with chart-reviewed heights (8,092) and weights (9,039) from 250 randomly selected participants and compared it with the current cleaning algorithm in All of Us. RESULTS The proposed algorithm classified 1.4 % of height and 1.5 % of weight errors in the full cohort. Sensitivity was 90.4 % (95 % CI: 79.0-96.8 %) for heights and 65.9 % (95 % CI: 56.9-74.1 %) for weights. Precision was 73.4 % (95 % CI: 60.9-83.7 %) for heights and 62.9 (95 % CI: 54.0-71.1 %) for weights. In comparison, the current cleaning algorithm has inferior performance in sensitivity (55.8 %) and precision (16.5 %) for height errors while having higher precision (94.0 %) and lower sensitivity (61.9 %) for weight errors. DISCUSSION Our proposed algorithm outperformed in detecting height errors compared to weights. It can serve as a valuable addition to the current All of Us cleaning algorithm for identifying erroneous height values.
Collapse
Affiliation(s)
- Andrew Guide
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Shawn Garbett
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Robert M Cronin
- Department of Internal Medicine, The Ohio State University, Columbus, OH, United States
| | - Matthew Spotnitz
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States.
| |
Collapse
|
2
|
Cruz TH, Boursaw B, Barqawi YK, FitzGerald CA, Enoah N, Hayes A, Caswell L. Community–Clinical Linkages: The Effects of the Healthy Here Wellness Referral Center on Chronic Disease Indicators Among Underserved Populations in New Mexico. Health Promot Pract 2022; 23:164S-173S. [DOI: 10.1177/15248399221111191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The majority of U.S. adults are living with at least one chronic condition, and people of color bear a disproportionate burden of chronic disease. Prior research identifies community–clinical linkages (CCLs) as a strategy for improving health. CCLs traditionally use health care providers to connect patients to community-based self-management programs. The purpose of this study was to examine the effectiveness of a centralized CCL system on health indicators and health disparities. Administrative health data were merged with referral system data to conduct a quasi-experimental comparative time series study with a comparison group of nonreferred patients. Interrupted time-series comparisons within referred patients were also conducted. Of the 2,920 patients meeting inclusion criteria, 972 (33.3%) received a referral during the study period (January 2019—September 2021). Hemoglobin A1c levels, used to diagnose diabetes, declined significantly among referred patients, as did disparities among Hispanic/Latinx participants compared with non-Hispanic White participants. No changes were observed in body mass index (BMI). Blood pressure increased among both referred and nonreferred patients. CCLs with a centralized referral system can effectively reduce markers of diabetes and may contribute to the maintenance of BMI. The observed increase in blood pressure may have been affected by the COVID-19 pandemic and warrants further study. Practitioners can work with community partners to implement a centralized CCL model, either on its own or to enhance existing clinician or community health worker-based models.
Collapse
Affiliation(s)
- Theresa H. Cruz
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Blake Boursaw
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Yazan K. Barqawi
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | | | - Natahlia Enoah
- Presbyterian Healthcare Services Community Health, Albuquerque, NM, USA
| | - Amos Hayes
- Presbyterian Healthcare Services Community Health, Albuquerque, NM, USA
| | - Leigh Caswell
- Presbyterian Healthcare Services Community Health, Albuquerque, NM, USA
| |
Collapse
|
3
|
Khan MS, Carroll RJ. Inference-based correction of multi-site height and weight measurement data in the All of Us research program. J Am Med Inform Assoc 2021; 29:626-630. [PMID: 34864995 DOI: 10.1093/jamia/ocab251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/29/2021] [Accepted: 11/08/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Measurement and data entry of height and weight values are error prone. Aggregation of medical record data from multiple sites creates new challenges prompting the need to identify and correct errant values. We sought to characterize and correct issues with height and weight measurement values within the All of Us (AoU) Research Program. MATERIALS AND METHODS Using the AoU Researcher Workbench, we assessed site-level measurement value distributions to infer unit types. We also used plausibility checks with exceptions for conditions with possible outlier values, eg obesity, and assessed for excess deviation within individual participant's records. RESULTS 15.8% of height and 22.4% of weight values had missing unit type information. DISCUSSION We identified several measurement unit related issues: the use of different units of measure within and between sites, missing units, and incorrect labeling of units. Failure to account for these in patient data repositories may lead to erroneous study results and conclusions. CONCLUSION Discrepancies in height and weight measurement data may arise from missing or mislabeled units. Using site- and participant-level analyses while accounting for outlier value-associated clinical conditions, we can infer measurement units and apply corrections. These methods are adaptable and expandable within AoU and other data repositories.
Collapse
Affiliation(s)
- Mirza S Khan
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Health System Veterans Administration Medical Center, Nashville, Tennessee, USA.,Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
4
|
Bowe B, Gibson AK, Xie Y, Yan Y, van Donkelaar A, Martin RV, Al-Aly Z. Ambient Fine Particulate Matter Air Pollution and Risk of Weight Gain and Obesity in United States Veterans: An Observational Cohort Study. ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:47003. [PMID: 33793302 PMCID: PMC8016176 DOI: 10.1289/ehp7944] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
BACKGROUND Experimental evidence and studies of children and adolescents suggest that ambient fine particulate matter [particulate matter ≤2.5μm in aerodynamic diameter (PM2.5)] air pollution may be obesogenic, but the relationship between PM2.5 and the risk of body weight gain and obesity in adults is uncertain. OBJECTIVES Our goal was to characterize the association between PM2.5 and the risks of weight gain and obesity. METHODS We followed 3,902,440 U.S. Veterans from 2010 to 2018 (median 8.1 y, interquartile range: 7.3-8.4) and assigned time-updated PM2.5 exposures by linking geocoded residential street addresses with satellite-based estimates of surface-level PM2.5 mass (at ∼1-km2 resolution). Associations with PM2.5 were estimated using Cox proportional hazards models for incident obesity [body mass index (BMI)≥30 kg/m2] and a 10-lb increase in weight relative to baseline and linear mixed models for associations with intra-individual changes in BMI and weight. RESULTS A 10-μg/m3 higher average annual PM2.5 concentration was associated with risk of incident obesity [n=2,325,769; hazard ratio (HR)=1.08 (95% CI: 1.06, 1.11)] and the risk of a 10-lb (4.54 kg) increase in weight [HR=1.07 (95% CI: 1.06, 1.08)] and with higher intra-individual changes in BMI [0.140 kg/m2 per year (95% CI: 0.139, 0.142)] and weight [0.968 lb/y (95% CI: 0.955, 0.981)]. Nonlinear exposure-response models indicated associations at PM2.5 concentrations below the national standard of 12 μg/m3. As expected, a negative exposure control (ambient air sodium) was not associated with obesity or weight gain. Associations were consistent in direction and magnitude across sensitivity analyses that included alternative outcomes and exposures assigned at different spatial resolutions. DISCUSSION PM2.5 air pollution was associated with the risk of obesity and weight gain in a large predominantly male cohort of U.S. Veterans. Discussions about health effects of PM2.5 should include its association with obesity, and deliberations about the epidemiology of obesity should consider its association with PM2.5. Investigation in other cohorts will deepen our understanding of the relationship between PM2.5 and weight gain and obesity. https://doi.org/10.1289/EHP7944.
Collapse
Affiliation(s)
- Benjamin Bowe
- Clinical Epidemiology Center, Research and Development Service, Veterans Affairs Saint Louis Health Care System, Saint Louis, Missouri, USA
- Department of Epidemiology and Biostatistics, College for Public Health and Social Justice, Saint Louis University, Saint Louis, Missouri, USA
- Veterans Research and Education Foundation of Saint Louis, Saint Louis, Missouri, USA
| | - Andrew K. Gibson
- Clinical Epidemiology Center, Research and Development Service, Veterans Affairs Saint Louis Health Care System, Saint Louis, Missouri, USA
- Veterans Research and Education Foundation of Saint Louis, Saint Louis, Missouri, USA
| | - Yan Xie
- Clinical Epidemiology Center, Research and Development Service, Veterans Affairs Saint Louis Health Care System, Saint Louis, Missouri, USA
- Department of Epidemiology and Biostatistics, College for Public Health and Social Justice, Saint Louis University, Saint Louis, Missouri, USA
- Veterans Research and Education Foundation of Saint Louis, Saint Louis, Missouri, USA
| | - Yan Yan
- Clinical Epidemiology Center, Research and Development Service, Veterans Affairs Saint Louis Health Care System, Saint Louis, Missouri, USA
- Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine, Saint Louis, Missouri, USA
| | - Aaron van Donkelaar
- Department of Physics and Atmospheric Science, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Energy, Environmental & Chemical Engineering, Washington University in Saint Louis, Saint Louis, Missouri, USA
| | - Randall V. Martin
- Department of Physics and Atmospheric Science, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Energy, Environmental & Chemical Engineering, Washington University in Saint Louis, Saint Louis, Missouri, USA
| | - Ziyad Al-Aly
- Clinical Epidemiology Center, Research and Development Service, Veterans Affairs Saint Louis Health Care System, Saint Louis, Missouri, USA
- Veterans Research and Education Foundation of Saint Louis, Saint Louis, Missouri, USA
- Department of Medicine, Washington University School of Medicine, Saint Louis, Missouri, USA
- Nephrology Section, Medicine Service, Veterans Affairs Saint Louis Health Care System, Saint Louis, Missouri, USA
- Institute for Public Health, Washington University in Saint Louis, Saint Louis, Missouri, USA
| |
Collapse
|
5
|
Is it time to stop sweeping data cleaning under the carpet? A novel algorithm for outlier management in growth data. PLoS One 2020; 15:e0228154. [PMID: 31978151 PMCID: PMC6980495 DOI: 10.1371/journal.pone.0228154] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 01/09/2020] [Indexed: 12/21/2022] Open
Abstract
All data are prone to error and require data cleaning prior to analysis. An important example is longitudinal growth data, for which there are no universally agreed standard methods for identifying and removing implausible values and many existing methods have limitations that restrict their usage across different domains. A decision-making algorithm that modified or deleted growth measurements based on a combination of pre-defined cut-offs and logic rules was designed. Five data cleaning methods for growth were tested with and without the addition of the algorithm and applied to five different longitudinal growth datasets: four uncleaned canine weight or height datasets and one pre-cleaned human weight dataset with randomly simulated errors. Prior to the addition of the algorithm, data cleaning based on non-linear mixed effects models was the most effective in all datasets and had on average a minimum of 26.00% higher sensitivity and 0.12% higher specificity than other methods. Data cleaning methods using the algorithm had improved data preservation and were capable of correcting simulated errors according to the gold standard; returning a value to its original state prior to error simulation. The algorithm improved the performance of all data cleaning methods and increased the average sensitivity and specificity of the non-linear mixed effects model method by 7.68% and 0.42% respectively. Using non-linear mixed effects models combined with the algorithm to clean data allows individual growth trajectories to vary from the population by using repeated longitudinal measurements, identifies consecutive errors or those within the first data entry, avoids the requirement for a minimum number of data entries, preserves data where possible by correcting errors rather than deleting them and removes duplications intelligently. This algorithm is broadly applicable to data cleaning anthropometric data in different mammalian species and could be adapted for use in a range of other domains.
Collapse
|
6
|
Chen JA, Stevens C, Wong SHM, Liu CH. Psychiatric Symptoms and Diagnoses Among U.S. College Students: A Comparison by Race and Ethnicity. Psychiatr Serv 2019; 70:442-449. [PMID: 30914001 PMCID: PMC6628693 DOI: 10.1176/appi.ps.201800388] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE The mental health challenges of college students are a critical public health concern, and they may be exacerbated among racial and ethnic minority groups. Unfortunately, there is a lack of recent large-scale research on this topic. This study provides an update on the mental health experiences of U.S. college students from minority backgrounds. METHODS This is a retrospective analysis of cross-sectional data from the spring 2015 administration of the American College Health Association-National College Health Assessment (ACHA-NCHA IIB). Survey results from 67,308 undergraduates at 108 colleges were analyzed. RESULTS Past-year rates of self-reported psychiatric symptoms and diagnoses were high, regardless of race or ethnicity. Students from minority groups generally reported lower rates of both symptoms and diagnoses compared with whites, with notable exceptions. Despite reporting generally lower rates of psychiatric diagnoses compared with whites, students who identified as multiracial (N=7,473) or Asian/Pacific Islander (N=7,166) were more likely to endorse having felt hopeless, so depressed that it was difficult to function, or overwhelmed by anger and were more likely to have considered or attempted suicide. Compared with whites, blacks and Hispanics endorsed lower rates of psychiatric diagnoses but had similar rates of past-year suicide attempts. CONCLUSIONS Lower rates of formally diagnosed psychiatric illnesses may obscure significant mental health burden among minority students, especially with regard to suicidal thoughts and attempts among Asian/Pacific Islander and multiracial students. Students from racial and ethnic minority backgrounds may have undetected psychiatric problems and, therefore, represent a particularly at-risk group on campus.
Collapse
Affiliation(s)
- Justin A Chen
- Massachusetts General Hospital, Boston (Chen); Harvard Medical School, Boston (Chen, Liu); Brigham & Women's Hospital, Boston (Liu); Department of Psychology, Willamette University, Salem, Oregon (Stevens); Beth Israel Deaconess Medical Center, Boston (Wong)
| | - Courtney Stevens
- Massachusetts General Hospital, Boston (Chen); Harvard Medical School, Boston (Chen, Liu); Brigham & Women's Hospital, Boston (Liu); Department of Psychology, Willamette University, Salem, Oregon (Stevens); Beth Israel Deaconess Medical Center, Boston (Wong)
| | - Sylvia H M Wong
- Massachusetts General Hospital, Boston (Chen); Harvard Medical School, Boston (Chen, Liu); Brigham & Women's Hospital, Boston (Liu); Department of Psychology, Willamette University, Salem, Oregon (Stevens); Beth Israel Deaconess Medical Center, Boston (Wong)
| | - Cindy H Liu
- Massachusetts General Hospital, Boston (Chen); Harvard Medical School, Boston (Chen, Liu); Brigham & Women's Hospital, Boston (Liu); Department of Psychology, Willamette University, Salem, Oregon (Stevens); Beth Israel Deaconess Medical Center, Boston (Wong)
| |
Collapse
|
7
|
Stevens C, Liu CH, Chen JA. Racial/ethnic disparities in US college students' experience: Discrimination as an impediment to academic performance. JOURNAL OF AMERICAN COLLEGE HEALTH : J OF ACH 2018; 66:665-673. [PMID: 29565755 DOI: 10.1080/07448481.2018.1452745] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 03/03/2018] [Accepted: 03/11/2018] [Indexed: 06/08/2023]
Abstract
OBJECTIVE AND PARTICIPANTS Using data from 69,722 US undergraduates participating in the spring 2015 National College Health Assessment, we examine racial/ethnic differences in students' experience of discrimination. METHOD Logistic regression predicted the experience of discrimination and its reported negative effect on academics. Additional models examined the effect of attending a Minority Serving Institution (MSI). RESULTS Discrimination was experienced by 5-15% of students, with all racial/ethnic minority groups examined- including Black, Hispanic, Asian, AI/NA/NA, and Multiracial students- more likely to report discrimination relative to White students. Of students who experienced discrimination, 15-25% reported it had negatively impacted their academic performance, with Hispanic and Asian students more likely to report negative impacts relative to White students. Attending an MSI was associated with decreased experiences of discrimination. CONCLUSION Students from racial/ethnic minority backgrounds are disproportionately affected by discrimination, with negative impacts for academic performance that are particularly marked for Hispanic and Asian students.
Collapse
Affiliation(s)
| | - Cindy H Liu
- b Beth Israel Deaconess Medical School, Harvard Medical School , Boston , Massachusetts , USA
| | - Justin A Chen
- c Massachusetts General Hospital , Harvard Medical School , Boston , Massachusetts , USA
| |
Collapse
|
8
|
Jackson KL, Mbagwu M, Pacheco JA, Baldridge AS, Viox DJ, Linneman JG, Shukla SK, Peissig PL, Borthwick KM, Carrell DA, Bielinski SJ, Kirby JC, Denny JC, Mentch FD, Vazquez LM, Rasmussen-Torvik LJ, Kho AN. Performance of an electronic health record-based phenotype algorithm to identify community associated methicillin-resistant Staphylococcus aureus cases and controls for genetic association studies. BMC Infect Dis 2016; 16:684. [PMID: 27855652 PMCID: PMC5114817 DOI: 10.1186/s12879-016-2020-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2016] [Accepted: 11/11/2016] [Indexed: 12/25/2022] Open
Abstract
Background Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic health record (EHR) based CA-MRSA phenotype algorithm utilizing both structured and unstructured data. Methods The algorithm was validated at three eMERGE consortium sites, and positive predictive value, negative predictive value and sensitivity, were calculated. The algorithm was then run and data collected across seven total sites. The resulting data was used in GWAS analysis. Results Across seven sites, the CA-MRSA phenotype algorithm identified a total of 349 cases and 7761 controls among the genotyped European and African American biobank populations. PPV ranged from 68 to 100% for cases and 96 to 100% for controls; sensitivity ranged from 94 to 100% for cases and 75 to 100% for controls. Frequency of cases in the populations varied widely by site. There were no plausible GWAS-significant (p < 5 E −8) findings. Conclusions Differences in EHR data representation and screening patterns across sites may have affected identification of cases and controls and accounted for varying frequencies across sites. Future work identifying these patterns is necessary. Electronic supplementary material The online version of this article (doi:10.1186/s12879-016-2020-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kathryn L Jackson
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | - Michael Mbagwu
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | | | - Daniel J Viox
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.,Emory University School of Medicine, Atlanta, GA, USA
| | - James G Linneman
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | | | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | | | - David A Carrell
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | | | - Jacqueline C Kirby
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Frank D Mentch
- The Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Lyam M Vazquez
- The Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Abel N Kho
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
9
|
Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, Ellis SB, Lingren T, Thompson WK, Savova G, Haines J, Roden DM, Harris PA, Denny JC. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016; 23:1046-1052. [PMID: 27026615 PMCID: PMC5070514 DOI: 10.1093/jamia/ocv202] [Citation(s) in RCA: 213] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 10/27/2015] [Accepted: 11/25/2015] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. RESULTS As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). DISCUSSION These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. CONCLUSION By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
Collapse
Affiliation(s)
| | - Peter Speltz
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Luke V Rasmussen
- Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | | | - Omri Gottesman
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | | | | | | | | | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Will K Thompson
- Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Dan M Roden
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Paul A Harris
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joshua C Denny
- Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
10
|
Mbagwu M, French DD, Gill M, Mitchell C, Jackson K, Kho A, Bryar PJ. Creation of an Accurate Algorithm to Detect Snellen Best Documented Visual Acuity from Ophthalmology Electronic Health Record Notes. JMIR Med Inform 2016; 4:e14. [PMID: 27146002 PMCID: PMC4871992 DOI: 10.2196/medinform.4732] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Revised: 01/28/2016] [Accepted: 02/20/2016] [Indexed: 11/21/2022] Open
Abstract
Background Visual acuity is the primary measure used in ophthalmology to determine how well a patient can see. Visual acuity for a single eye may be recorded in multiple ways for a single patient visit (eg, Snellen vs. Jäger units vs. font print size), and be recorded for either distance or near vision. Capturing the best documented visual acuity (BDVA) of each eye in an individual patient visit is an important step for making electronic ophthalmology clinical notes useful in research. Objective Currently, there is limited methodology for capturing BDVA in an efficient and accurate manner from electronic health record (EHR) notes. We developed an algorithm to detect BDVA for right and left eyes from defined fields within electronic ophthalmology clinical notes. Methods We designed an algorithm to detect the BDVA from defined fields within 295,218 ophthalmology clinical notes with visual acuity data present. About 5668 unique responses were identified and an algorithm was developed to map all of the unique responses to a structured list of Snellen visual acuities. Results Visual acuity was captured from a total of 295,218 ophthalmology clinical notes during the study dates. The algorithm identified all visual acuities in the defined visual acuity section for each eye and returned a single BDVA for each eye. A clinician chart review of 100 random patient notes showed a 99% accuracy detecting BDVA from these records and 1% observed error. Conclusions Our algorithm successfully captures best documented Snellen distance visual acuity from ophthalmology clinical notes and transforms a variety of inputs into a structured Snellen equivalent list. Our work, to the best of our knowledge, represents the first attempt at capturing visual acuity accurately from large numbers of electronic ophthalmology notes. Use of this algorithm can benefit research groups interested in assessing visual acuity for patient centered outcome. All codes used for this study are currently available, and will be made available online at https://phekb.org.
Collapse
Affiliation(s)
- Michael Mbagwu
- Department of Ophthalmology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States.
| | | | | | | | | | | | | |
Collapse
|
11
|
Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 2016; 17:129-45. [PMID: 26875678 DOI: 10.1038/nrg.2015.36] [Citation(s) in RCA: 166] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome-phenome relationship.
Collapse
|
12
|
Borthwick KM, Smelser DT, Bock JA, Elmore JR, Ryer EJ, Ye Z, Pacheco JA, Carrell DS, Michalkiewicz M, Thompson WK, Pathak J, Bielinski SJ, Denny JC, Linneman JG, Peissig PL, Kho AN, Gottesman O, Parmar H, Kullo IJ, McCarty CA, Böttinger EP, Larson EB, Jarvik GP, Harley JB, Bajwa T, Franklin DP, Carey DJ, Kuivaniemi H, Tromp G. ePhenotyping for Abdominal Aortic Aneurysm in the Electronic Medical Records and Genomics (eMERGE) Network: Algorithm Development and Konstanz Information Miner Workflow. INTERNATIONAL JOURNAL OF BIOMEDICAL DATA MINING 2015; 4:113. [PMID: 27054044 PMCID: PMC4820287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND OBJECTIVE We designed an algorithm to identify abdominal aortic aneurysm cases and controls from electronic health records to be shared and executed within the "electronic Medical Records and Genomics" (eMERGE) Network. MATERIALS AND METHODS Structured Query Language, was used to script the algorithm utilizing "Current Procedural Terminology" and "International Classification of Diseases" codes, with demographic and encounter data to classify individuals as case, control, or excluded. The algorithm was validated using blinded manual chart review at three eMERGE Network sites and one non-eMERGE Network site. Validation comprised evaluation of an equal number of predicted cases and controls selected at random from the algorithm predictions. After validation at the three eMERGE Network sites, the remaining eMERGE Network sites performed verification only. Finally, the algorithm was implemented as a workflow in the Konstanz Information Miner, which represented the logic graphically while retaining intermediate data for inspection at each node. The algorithm was configured to be independent of specific access to data and was exportable (without data) to other sites. RESULTS The algorithm demonstrated positive predictive values (PPV) of 92.8% (CI: 86.8-96.7) and 100% (CI: 97.0-100) for cases and controls, respectively. It performed well also outside the eMERGE Network. Implementation of the transportable executable algorithm as a Konstanz Information Miner workflow required much less effort than implementation from pseudo code, and ensured that the logic was as intended. DISCUSSION AND CONCLUSION This ePhenotyping algorithm identifies abdominal aortic aneurysm cases and controls from the electronic health record with high case and control PPV necessary for research purposes, can be disseminated easily, and applied to high-throughput genetic and other studies.
Collapse
Affiliation(s)
- Kenneth M Borthwick
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Diane T Smelser
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Jonathan A Bock
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - James R Elmore
- Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
| | - Evan J Ryer
- Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
| | - Zi Ye
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Jennifer A. Pacheco
- Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - David S. Carrell
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - Michael Michalkiewicz
- Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
| | - William K Thompson
- Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jyotishman Pathak
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | | | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - James G Linneman
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Abel N Kho
- Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Harpreet Parmar
- Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | | | - Erwin P Böttinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric B Larson
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - Gail P Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, WA, USA
| | - John B Harley
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Tanvir Bajwa
- Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
| | - David P Franklin
- Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
| | - David J Carey
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA,Department of Surgery, Temple University School of Medicine, Philadelphia, PA, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA,Corresponding author: Gerard Tromp, The Sigfried and Janet Weis Center for Research, Geisinger Health, Danville, PA, USA, Tel: (570) 271-5592;
| |
Collapse
|
13
|
Lehmann CU, Gundlapalli AV. Improving Bridging from Informatics Practice to Theory. Methods Inf Med 2015; 54:540-5. [PMID: 26577504 DOI: 10.3414/me15-01-0138] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 10/22/2015] [Indexed: 11/09/2022]
Abstract
BACKGROUND In 1962, Methods of Information in Medicine ( MIM ) began to publish papers on the methodology and scientific fundamentals of organizing, representing, and analyzing data, information, and knowledge in biomedicine and health care. Considered a companion journal, Applied Clinical Informatics ( ACI ) was launched in 2009 with a mission to establish a platform that allows sharing of knowledge between clinical medicine and health IT specialists as well as to bridge gaps between visionary design and successful and pragmatic deployment of clinical information systems. Both journals are official journals of the International Medical Informatics Association. OBJECTIVES As a follow-up to prior work, we set out to explore congruencies and interdependencies in publications of ACI and MIM. The objectives were to describe the major topics discussed in articles published in ACI in 2014 and to determine if there was evidence that theory in 2014 MIM publications was informed by practice described in ACI publications in any year. We also set out to describe lessons learned in the context of bridging informatics practice and theory and offer opinions on how ACI editorial policies could evolve to foster and improve such bridging. METHODS We conducted a retrospective observational study and reviewed all articles published in ACI during the calendar year 2014 (Volume 5) for their main theme, conclusions, and key words. We then reviewed the citations of all MIM papers from 2014 to determine if there were references to ACI articles from any year. Lessons learned in the context of bridging informatics practice and theory and opinions on ACI editorial policies were developed by consensus among the two authors. RESULTS A total of 70 articles were published in ACI in 2014. Clinical decision support, clinical documentation, usability, Meaningful Use, health information exchange, patient portals, and clinical research informatics emerged as major themes. Only one MIM article from 2014 cited an ACI article. There are several lessons learned including the possibility that there may not be direct links between MIM theory and ACI practice articles. ACI editorial policies will continue to evolve to reflect the breadth and depth of the practice of clinical informatics and articles received for publication. Efforts to encourage bridging of informatics practice and theory may be considered by the ACI editors. CONCLUSIONS The lack of direct links from informatics theory-based papers published in MIM in 2014 to papers published in ACI continues as was described for papers published during 2012 to 2013 in the two companion journals. Thus, there is little evidence that theory in MIM has been informed by practice in ACI.
Collapse
Affiliation(s)
| | - A V Gundlapalli
- Adi V. Gundlapalli, MD, PhD, MS, Chief Health Informatics Officer, VA Salt Lake City Health Care System, Salt Lake City, UT 84148, USA, E-mail:
| |
Collapse
|